icegrind – Valgrind Plugin for Optimizing Cold Startup

Most program binaries are laid out with little to no regard to how programs get loaded from disk. This disconnect between compile-time and runtime behaviour of binaries imposes a significant performance penalty to on large applications such as browsers, office suites, etc.

It is incredibly difficult to observe both the cause (ie calling a random function) of binary-induced IO and the effect (the program gets suspended during startup while parts being loaded from disk), so this area doesn’t get as much optimization love as it deserves.

My estimate is that around 50% of Firefox startup time is wasted on subobtimal binary layout. My previous post demonstrated the kind of difference a better binary layout can make. Note that reordering executables isn’t the only solution, eliminating dead code should also speed things up (deleting dead code is a hard).

Optimizing Binary Layout
Disclaimer:I just finished my 3rd rewrite of icegrind a few hours ago, be gentle.

Ingredients: Valgrind SVN trunk + icegrind patch, GNU Gold + section-ordering-file patch, a way to describe contents of binaries.

Step 1a: Produce a build
Since I am interested in reorganizing program binaries, I build mozilla with “-ffunction-sections -fdata-sections” in CFLAGS/CXXFLAGS

I also prelink the binaries in dist/bin such that my binaries better correspond to how they will be used:
prelink $LD_LIBRARY_PATH/firefox-bin $LD_LIBRARY_PATH/*.so

Step 1b: Produce a description of interesting files
I use my elflog utility to produce a .sections description of files I’m interested in. Elflog looks at the symbol table and tries to infer section names (produced by -ffunction-sections -fdata-sections) from symbol names/locations(see also –print-map option for ld).

elflog  –contents  libxul.so >  libxul.so.sections
elflog currently emits non-existent .comment.* sections because it gets confused by 0-length sections such as .bss.
Note, one can also build tools to describe other kinds of files, such as jar or sqlite files. The only limitation is that Icegrind currently only tracks mmap()-caused disk IO, it would be trivial to extend it to deal with open/seek/read kind of disk IO.

Step 2: Produce a log with icegrind!
Apply my icegrind patch, build+install valgrind.
Run Firefox
valgrind –tool=icegrind firefox-bin -profile /tmp/ff -no-remote
This will produce a .log file for every mmap()ed file with a .sections description. This log chronologically lists sections in the order of access.

Step 3: Tell gold to link using the above log
Build/install binutils (I use a CVS checkout from a month ago) with the section ordering patch, specify –enable-gold.
To reorder the binary, I just add -Wl,–section-ordering-file,libxul.so.log to my linker commandline.
Note there are still some teething issues with using this patch, it exhibits N^2 behavior (ie takes 10min to link libxul.so with it) and occasionally swaps order for .rela.plt and .rela.dyn, which makes prelink upset. But unlike my earlier attempt with linker scripts, it does not affect the binary size.

Step 4: Enjoy!
Now strip, install, prelink your binaries and enjoy faster startup.

Plans

I would like to see the gold patch fixed up and landed. Once that is done I’d like to turn this on for our Linux and mobile linux builds.

I am hoping that some sort of sensible ordering of binaries will become commonplace in the future.

7 comments

  1. You seem to work so often in unexplored areas of software, it must be quite exciting! I hope all of your discoveries can help future software development.

  2. I just have to ask:

    How much of what you’ve learned about profiling and optimizing the code loading process with binaries and disk I/O would translate to developing better tools to profile and optimize the code loading process with webapps and network I/O?

    Dave

  3. Thanks for a fascinating post (as usual ;-)

    Please push as much of your work as possible upstream to various projects (e.g. GCC).

  4. The most consistently interesting posts on Planet Mozilla! It’s inspiring to see you find so many opportunities to do things better by looking closely at what actually takes place.

    Please modify gold and mozilla build scripts to create a FF/TB Omnijar starting on the disk’s second cylinder that the PC’s BIOS unpacks into memory at 0xbeeffeed and then executes. :-)

  5. Nice post. I will work on landing the gold patch with those issues resolved asap.

  6. Now if we could just get its equivalent for java, to speedup startup…