Most program binaries are laid out with little to no regard to how programs get loaded from disk. This disconnect between compile-time and runtime behaviour of binaries imposes a significant performance penalty to on large applications such as browsers, office suites, etc.
It is incredibly difficult to observe both the cause (ie calling a random function) of binary-induced IO and the effect (the program gets suspended during startup while parts being loaded from disk), so this area doesn’t get as much optimization love as it deserves.
My estimate is that around 50% of Firefox startup time is wasted on subobtimal binary layout. My previous post demonstrated the kind of difference a better binary layout can make. Note that reordering executables isn’t the only solution, eliminating dead code should also speed things up (deleting dead code is a hard).
Optimizing Binary Layout
Disclaimer:I just finished my 3rd rewrite of icegrind a few hours ago, be gentle.
Step 1a: Produce a build
Since I am interested in reorganizing program binaries, I build mozilla with “-ffunction-sections -fdata-sections” in CFLAGS/CXXFLAGS
I also prelink the binaries in dist/bin such that my binaries better correspond to how they will be used:
prelink $LD_LIBRARY_PATH/firefox-bin $LD_LIBRARY_PATH/*.so
Step 1b: Produce a description of interesting files
I use my elflog utility to produce a .sections description of files I’m interested in. Elflog looks at the symbol table and tries to infer section names (produced by -ffunction-sections -fdata-sections) from symbol names/locations(see also –print-map option for ld).
elflog –contents libxul.so > libxul.so.sections
elflog currently emits non-existent .comment.* sections because it gets confused by 0-length sections such as .bss.
Note, one can also build tools to describe other kinds of files, such as jar or sqlite files. The only limitation is that Icegrind currently only tracks mmap()-caused disk IO, it would be trivial to extend it to deal with open/seek/read kind of disk IO.
Step 2: Produce a log with icegrind!
Apply my icegrind patch, build+install valgrind.
valgrind –tool=icegrind firefox-bin -profile /tmp/ff -no-remote
This will produce a .log file for every mmap()ed file with a .sections description. This log chronologically lists sections in the order of access.
Step 3: Tell gold to link using the above log
Build/install binutils (I use a CVS checkout from a month ago) with the section ordering patch, specify –enable-gold.
To reorder the binary, I just add -Wl,–section-ordering-file,libxul.so.log to my linker commandline.
Note there are still some teething issues with using this patch, it exhibits N^2 behavior (ie takes 10min to link libxul.so with it) and occasionally swaps order for .rela.plt and .rela.dyn, which makes prelink upset. But unlike my earlier attempt with linker scripts, it does not affect the binary size.
Step 4: Enjoy!
Now strip, install, prelink your binaries and enjoy faster startup.
I would like to see the gold patch fixed up and landed. Once that is done I’d like to turn this on for our Linux and mobile linux builds.
I am hoping that some sort of sensible ordering of binaries will become commonplace in the future.