I have been told that it should be possible to control the way the GNU linker lays out binaries. Unfortunately until recently I couldn’t figure out the right incantations to convince ld to do my bidding. Turns out what I needed was to be stranded on a beach in Fiji with nothing better to do than to reread the ld info page a few times.
- Produce 2 mozilla builds:
A tracing build with -finstrument-functions in CXXFLAGS/CFLAGS
A release build with -ffunction-sections and -fdata-sections CXXFLAGS/CFLAGS to allow the linker to move stuff at function or static data(mostly variables) granularity
- Link my profile.cpp into libxul in the tracing build (without -finstrument-functions flag)
- Run the tracing build, capturing the spew from profile.cpp into a log file
- Feed the log file to my script to produce a linker script. This will produce library.so.script files for all of Mozilla libraries.
- Rebuild relevant libraries in the release build with -T library.so.script linker flag
- Enjoy faster startup
This results in 200ms faster startup my 7200rpm laptop harddrive which is about a 10% of my startup. I think that’s pretty good for a proof of concept. Unfortunately there isn’t a measurable win on the SSD (not surprising) nor a reduction in memory usage (I expected one due to not having to page in code that isn’t needed for firefox startup).
I suspect the problem is that data sections need to be laid out adjacent to functions that refer to them. I started sketching out a treehydra script to extract that info.
I posted the relevant testcase and scripts. Do hg clone http://people.mozilla.com/~tglek/startup/ld to see the simple testcase and various WIP firefox scripts.
The majority of Firefox startup overhead (prior to rendering of web pages) comes from frustrating areas such inefficient libraries (eg fontconfig, gtk) and the mess caused by crappy layout of binaries and overuse of dynamic libraries. This post describes one small step towards fixing the crappy layout of our binaries.
I would like to end up in a world where our binaries are static and laid out such that they are read sequentially on startup (such that we can use the massive sequential read speeds provided by modern storage media). Laying out code/data properly should result in memory usage reductions which should be especially welcome on Fennec (especially on Windows Mobile).
I am hoping to see 30-50% startup time improvements from this work if everything goes according to plan.