Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builder: use ar-chives for linking big sketches (was: use the @ syntax to reduce command line length if needed) #961

Merged
merged 5 commits into from
Nov 12, 2020

Conversation

cmaglie
Copy link
Member

@cmaglie cmaglie commented Sep 21, 2020

EDIT:
I've changed the implementation, see: #961 (comment) for more info.
The previous solution (with the @file.txt) is here: 7669f24


OLD solution:
When there are a lot of object files the command line becomes very long.
In this case, the @file.txt format is used to write down the list into a file and offload the command line.

Fix #839

@cmaglie cmaglie self-assigned this Sep 21, 2020
@cmaglie cmaglie changed the title Very long command line builder: use the @ syntax to reduce command line length if needed Sep 21, 2020
@cmaglie
Copy link
Member Author

cmaglie commented Oct 5, 2020

Testing with Marlin firmware 2.0.7 and building for arduino:avr:mega will fail on Windows:

Linking everything together...
"C:\\Users\\ubi\\AppData\\Local\\Arduino15\\packages\\arduino\\tools\\avr-gcc\\7.3.0-atmel3.6.1-arduino7/bin/avr-gcc" -w -Os -g -flto -fuse-linker-plugin -Wl,--gc-sections -mmcu=atmega2560 -o "C:\\Users\\ubi\\AppData\\Local\\Temp\\arduino-sketch-CB28F28118449BBC7C74EF627B774121/Marlin.ino.elf" "@C:\\Users\\ubi\\AppData\\Local\\Temp\\arduino-sketch-CB28F28118449BBC7C74EF627B774121\\object_files.txt" "C:\\Users\\ubi\\AppData\\Local\\Temp\\arduino-sketch-CB28F28118449BBC7C74EF627B774121/core\\core.a" "-LC:\\Users\\ubi\\AppData\\Local\\Temp\\arduino-sketch-CB28F28118449BBC7C74EF627B774121" -lm
avr-gcc: error: CreateProcess: No such file or directory
Error during build: exit status 1

It seems that the @ notation doesn't help here, the object_files.txt size still counts toward the total number of characters (it may be that avr-gcc prepares a command line for avr-ld and that one will exceed the limit?...)

@matthijskooijman
Copy link
Collaborator

(it may be that avr-gcc prepares a command line for avr-ld and that one will exceed the limit?...)

That sounds plausible. Probably Windows has something like Linux' strace so you can trace system calls and confirm? Doesn't really help with a fix, though.

@facchinm
Copy link
Member

facchinm commented Oct 6, 2020

(it may be that avr-gcc prepares a command line for avr-ld and that one will exceed the limit?...)

it does indeed 😞
The only workaround I see right now is archiving the object files, one by one, in a .a and then passing it to the linker.
On Windows in particular this action is painfully slow but it could be the "final" workaround.

@ubidefeo
Copy link

ubidefeo commented Oct 6, 2020

@facchinm
being clear that I don't exactly know what I'm talking about... can we "archive" in batch?
the issue seems to appear at 300 files, can we crate .a in batches of 100?

@facchinm
Copy link
Member

facchinm commented Oct 6, 2020

@ubidefeo sure, the "recipe" is the same we already use to archive the core and libraries with dot_a_linkage
eg. https://github.com/arduino/ArduinoCore-mbed/blob/master/platform.txt#L83
You can replace {object_file} with whatever you want (also 100 files if they fit the commandline).
The only thing to take care is to delete the archive when a file contained in it changes, otherwise we get N copies that will expose the same symbols to the linker.

@matthijskooijman
Copy link
Collaborator

Note that if you put things in an archive that are not normally in there, you also have to wrap the .a file on the commandline between -Wl,--whole-archive and -Wl,--no-whole-archive to make sure that all the .o files in the .a are unconditionally included in the link, rather than only when needed (i.e. don't enable dot_a_linkage accidentally).

This should fix the command line too big issue on Windows.
This is required because gcc-ar checks if an object file is already
in the archive by looking ONLY at the filename WITHOUT the path, so
it may happens that, for example, an object file named "subdir/spi.o",
already inside the archive, may be overwritten by an object file in
"anotherdir/spi.o" only because they are both named spi.o and even if
they are compiled on different directories.
@cmaglie cmaglie force-pushed the very-long-command-line branch from 7669f24 to 8177510 Compare October 27, 2020 00:03
@cmaglie
Copy link
Member Author

cmaglie commented Oct 27, 2020

Hi everybody :-)

I've scratched the @objects_list.txt solution and reimplemented it with the ar-chive. I've made some preliminary tests on Linux and Windows and, finally, it seems to work properly.

Some interesting facts:

  • We need the -Wl,--whole-archive and -Wl,--no-whole-archive as suggested by @matthijskooijman otherwise we will get some random undefined reference. I still don't understand the reason, I thought that using a command line with a list of object files will have the same effect as using an archive with the same object files inside but, clearly, this is not the case.
  • We must create an archive file for each subdirectory because when ar adds files to an archive it checks for duplicate entries only by looking at the name of the file (without the path). In other words, if an archive contains foo/bar.o and we try to add bar/bar.o then foo/bar.o is replaced. This means that we can't make a big archive with all the files inside unless we know that all the object files have a unique name among the whole project but we can not guarantee that and, in particular, this doesn't happen on Marlin where we have HAL/AVR/spi.o or HAL/STM32/spi.o for example.

I've also increased the minimum command line length that triggers the ar-chive creation to 30000. This gives us two advantages:

  • the build speed is not affected for smaller/medium sketches (the 99.99% of the sketches out there)
  • if there is a problem with the new ar-chive system (for example -Wl,--whole-archive not supported or weird ar.patterns or any other issue that I cannot foresee...) then the problem will manifest on a sketch that already will not compile (due to the command line length issue).

@cmaglie cmaglie changed the title builder: use the @ syntax to reduce command line length if needed builder: use ar-chives for linking big sketches (was: use the @ syntax to reduce command line length if needed) Oct 27, 2020
@matthijskooijman
Copy link
Collaborator

We need the -Wl,--whole-archive and -Wl,--no-whole-archive as suggested by @matthijskooijman otherwise we will get some random undefined reference. I still don't understand the reason, I thought that using a command line with a list of object files will have the same effect as using an archive with the same object files inside but, clearly, this is not the case.

Nope. If you put a .o file on the commandline, it is included in the link unconditionally. If you put a .a file in the commandline, then only the .o files inside there that are needed (that have symbols that are currently undefined) are included. This is especially tricky since the linker only does one pass IIRC, so any undefined symbols later on the commandline cannot be resolved by .a files earlier on the commandline (or vice versa, I can never remember). The latter can also be circumvented using -Wl,--start-group or something like that, which keeps iterating files until all references are defined, but that approach is still different from the -Wl,--whole-archive approach or specifying .o files directly (consider a .o file that contains only an ISR, which is never referenced. When using .a files without -Wl,--whole-archive, such a .o file will never be included in the link).

We must create an archive file for each subdirectory because when ar adds files to an archive it checks for duplicate entries only by looking at the name of the file (without the path).

Good point. I have a little doubt about your implementation, though: It now groups object files per directory which could change the order of object files in the link (I suspect this works out in practice because the files in objectFiles are already grouped by directory, so so the end result is the same, but this relies on the way objectFiles is generated). This might not be a real issue, though.

Still, some alternative solutions to consider:

  • Use ar with the P modifier:

       P   Use the full path name when matching or storing names in the archive.  Archives created with full path names are not POSIX compliant, and thus
          may not work with tools other than up to date GNU tools.  Modifying such archives with GNU ar without using P will remove the full path names
          unless the archive is a thin archive.  Note that P may be useful when adding files to a thin archive since r without P ignores the path when
          choosing which element to replace.  Thus
    
                  ar rcST archive.a subdir/file1 subdir/file2 file1
    
          will result in the first "subdir/file1" being replaced with "file1" from the current directory.  Adding P will prevent this replacement.
    

    This would solve the problem, but require modifying the ar recipe (or adding a new one), and has potential for more incompatibilities with older tools maybe, so probably not the best solution.

  • Instead of having an archive file per directory, just start with one object file and switch to a second one only when you find a duplicate name. I.e. start with object_files.a (or object_files_0.a), keeping track of all .o files added to it. As soon as you see a duplicate, switch to object_files_1.a and clear the list of .o files. In common cases, this will probably work with just a single object file, even when many subdirectories are involved, which could be more efficient. In any case, this will never have more .a files than the number of subdirectories (i.e. than the number of .a files in your current approach).

I've also increased the minimum command line length that triggers the ar-chive creation to 30000. This gives us two advantages:

Yup, agreed.

@cmaglie
Copy link
Member Author

cmaglie commented Oct 27, 2020

Nope. If you put a .o file on the commandline, it is included in the link unconditionally. If you put a .a file in the commandline, then only the .o files inside there that are needed (that have symbols that are currently undefined) are included......

Thanks! that explains a bit, but still not everything, because wrt the ordering of .a files:

It now groups object files per directory which could change the order of object files in the link (I suspect this works out in practice because the files in objectFiles are already grouped by directory, so so the end result is the same, but this relies on the way objectFiles is generated). This might not be a real issue, though.

in the current implementation of this PR (8177510) the list of archive files is kept in a map and iterating over a map in golang does not guarantee the ordering. In other words, as it is now, the .a files are placed in the command line in random order, BTW, I never had a single undefined reference until now, even if I try to compile Marlin several times.
This makes me think that --whole-archive will always pick everything in the archive and produce a superset of --start-group.
At this point I'm also wondering what's the purpose of --start-group, since --whole-archive seems to replace it 100%... maybe it had a purpose before LTO?

Instead of having an archive file per directory, just start with one object file and switch to a second one only when you find a duplicate name. I.e. start with object_files.a (or object_files_0.a), keeping track of all .o files added to it. As soon as you see a duplicate, switch to object_files_1.a and clear the list of .o files. In common cases, this will probably work with just a single object file, even when many subdirectories are involved, which could be more efficient. In any case, this will never have more .a files than the number of subdirectories (i.e. than the number of .a files in your current approach).

that was my first approach but, in the end, I did not like it because I found chaotic the resulting build folder, even if we use the objs.a only as an "internal" artifact, I prefer to keep the structure clean because it makes it more accessible to external developers to examine and at the same time easier to debug.

@matthijskooijman
Copy link
Collaborator

In other words, as it is now, the .a files are placed in the command line in random order, BTW, I never had a single undefined reference until now, even if I try to compile Marlin several times.
This makes me think that --whole-archive will always pick everything in the archive and produce a superset of --start-group.

Yes, I think that is true. And in most cases, link order is not really relevant, but I think it can be relevant when multiple but different definitions of symbols are available (i.e. weak or inline symbols), where the link order can determine which of these is selected.

At this point I'm also wondering what's the purpose of --start-group, since --whole-archive seems to replace it 100%... maybe it had a purpose before LTO?

Well, --whole-archive unconditionally includes every file, which you might not want (i.e. like for core.a where we rely on the conditional inclusion to only get serial ISRs when referencing the Serial object). --start-group is just a way to work around the regular process-once mechanism when two .a files have mutual dependencies (note that, I think, all .o files within a .a files are always treated as a group already, so mutual dependencies inside a .a file already work even without --start-group).

@ubidefeo ubidefeo self-requested a review October 28, 2020 15:01
Copy link

@ubidefeo ubidefeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested on Windows 10 and Mac OS 10.14.6 to compile Marlin, which previously failed.
It now works as expected, so approved on my side

Copy link

@ubidefeo ubidefeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cmaglie
Copy link
Member Author

cmaglie commented Nov 3, 2020

Well, --whole-archive unconditionally includes every file, which you might not want (i.e. like for core.a where we rely on the conditional inclusion to only get serial ISRs when referencing the Serial object). --start-group is just a way to work around the regular process-once mechanism when two .a files have mutual dependencies (note that, I think, all .o files within a .a files are always treated as a group already, so mutual dependencies inside a .a file already work even without --start-group).

I've checked the binary produced by compiling the Marlin firmware before and after the patch:

  • the size is the same
  • the binary content is different, after analyzing it with hexdump and objdump I see that the differences are due:
    • to a timestamp string that is compiled in as a constant
    • to a reordering of some functions (the functions compiled in the .elf are the same only with a slightly different ordering)

After this analysis, I'm for merging this PR as it is now.

@matthijskooijman do you have any other concerns?

@ubidefeo
Copy link

ubidefeo commented Nov 4, 2020

@cmaglie let's wait a bit longer for any comment from @matthijskooijman which are always insightful, but I'd like this to be merged asap and move on :)

@matthijskooijman
Copy link
Collaborator

@ubidefeo Thanks for the prod :-)

I think merging this would be ok. I do still think that the compilation order could be subtly modified and I can probably construct an example where the actual behaviour would be different (between regular and archified commandlines, or between multiple invocations of archivified commandlines) because of that, but I'm pretty sure this is pretty edge casey behavior.

So I still think it would be good to fix this and 1) use a consistent order between multiple invocations and 2) use a consistent order between with and without archive files (e.g. compiling on Windows and Linux), but I'm not sure if this should block merging this fix now. Maybe merge and create an issue to track this for later?

@ubidefeo
Copy link

ubidefeo commented Nov 4, 2020

Yours are all valid observations, @matthijskooijman

but I'm not sure if this should block merging this fix now. Maybe merge and create an issue to track this for later?

I am ok with this as it fixes something that has been lurking around for a while.

As for creating another issue, feel free to open one which also includes the edge cases you're thinking of

@janjongboom
Copy link

This is our no. 1 support issue with Arduino at the moment, so not blocking on it would be very helpful on our end.

@cmaglie cmaglie merged commit bb42ebe into arduino:master Nov 12, 2020
@cmaglie cmaglie deleted the very-long-command-line branch November 12, 2020 10:19
@janjongboom
Copy link

@cmaglie @ubidefeo Which IDE / CLI version will this be released in?

@ubidefeo
Copy link

@janjongboom this will end up in CLI 0.14.0 (already in nightly) but I believe it's already in the IDE's nightly build.
CLI 0.14.0 is very close :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mbed target - Linking fails under Windows with large list of source files
6 participants