Teaching ld to optimize binaries for startup

I have been told that it should be possible to control the way the GNU linker lays out binaries. Unfortunately until recently I couldn’t figure out the right incantations to convince ld to do my bidding. Turns out what I needed was to be stranded on a beach in Fiji with nothing better to do than to reread the ld info page a few times.

Recipe:

  1. Produce 2 mozilla builds:
    A tracing build with -finstrument-functions in CXXFLAGS/CFLAGS
    A release build with -ffunction-sections and -fdata-sections CXXFLAGS/CFLAGS to allow the linker to move stuff at function or static data(mostly variables) granularity
  2. Link my profile.cpp into libxul in the tracing build (without -finstrument-functions flag)
  3. Run the tracing build, capturing the spew from profile.cpp into a log file
  4. Feed the log file to my script to produce a linker script. This will produce library.so.script files for all of Mozilla libraries.
  5. Rebuild relevant libraries in the release build with -T library.so.script linker flag
  6. Enjoy faster startup

This results in 200ms faster startup my 7200rpm laptop harddrive which is about a 10% of my startup. I think that’s pretty good for a proof of concept. Unfortunately there isn’t a measurable win on the SSD (not surprising) nor a reduction in memory usage (I expected one due to not having to page in code that isn’t needed for firefox startup).

I suspect the problem is that data sections need to be laid out adjacent to functions that refer to them. I started sketching out a treehydra script to extract that info.

I posted the relevant testcase and scripts. Do hg clone http://people.mozilla.com/~tglek/startup/ld to see the simple testcase and various WIP firefox scripts.

Long-term Expectations

The majority of Firefox startup overhead (prior to rendering of web pages) comes from frustrating areas such inefficient libraries (eg fontconfig, gtk) and the mess caused by crappy layout of binaries and overuse of dynamic libraries. This post describes one small step towards fixing the crappy layout of our binaries.

I would like to end up in a world where our binaries are static and laid out such that they are read sequentially on startup (such that we can use the massive sequential read speeds provided by modern storage media). Laying out code/data properly should result in memory usage reductions which should be especially welcome on Fennec (especially on Windows Mobile).

I am hoping to see 30-50% startup time improvements from this work if everything goes according to plan.

10 comments

  1. To gain less than a second of startup time (typically the thing you do every other minute, right?), you’re actually considering static binaries ? Oh my…

  2. Yeah. Note static in this case means folding ff libraries into the main firefox binary. See https://bugzilla.mozilla.org/show_bug.cgi?id=525013

    It’s likely that linux distributions will continue to ship the dynamic build.

  3. This is pretty awesome. 10-50% startup improvements make me giddy.

  4. Will it be possible to do something similar on Windows too?

  5. Amazing stuff. Indeed, reducing the number of loaded libraries can be an easy win. This makes me want to rip out the remains of libgnome(ui) and gnome-vfs in favor of straight glib/gtkā€¦ if only mozilla wouldn’t be so conservative bumping the required versions of those libraries.

    Talking about gtk: why are you claiming it to be inefficient? Fixing it would benefit the gnome desktop as a whole.

  6. Fixing gtk doesn’t quite seem to be Mozilla’s core competency, and we have a comparative advantage in fixing Mozilla bugs that affect all platforms to similar extents (just as gtk hackers have a comparative advantage in fixing gtk bugs).

    Now, if some gtk hacker stepped up and walked through things with some Mozilla people, it might make sense to spend some time on it…

  7. I read through your steps, and my first thought was “that sounds like it could be automated into the build system.” Then my second thought was “why doesn’t GCC’s PGO just do this for us?” Seriously, couldn’t the PGO optimizer be reordering functions for us? Should we file a GCC bug on that?

  8. Maybe if there are problems with gtk it might be worth asking on the mailing list if anybody wants to help, or even consider organising a hackfest with the gtk guys.

  9. Am I completely off base or does all of this performance tuning pale in comparison to the time FF takes to phone home and check for updates for itself and all addons.

    When behind the proxy at my work, on my underpowered work computer which is ridiculously overloaded with security crap my FF start up time is well in excess of a minute.

    Someone needs to create a update suppression addon that prevents FF from behind the scenes connections to the internet to check for updates.

  10. @Eric, update checking can be disabled. I do that for the “lab image” at work, because applying updates to lab machines get erased at reboot :)

    Examples are:
    user_pref(“app.update.enabled”, false);
    user_pref(“browser.search.update”, false);
    user_pref(“extensions.update.enabled”, false);