Opening files is relatively expensive. There is a small syscall overhead and a higher overhead of fetching data from disk. Depending on physical data layout and disk type, this can leave modern CPUs twiddling their thumbs for a long time while the disk skips around fetching all of the different file pieces.
Optimization #1: Fewer naked files
About two years ago I started gathering naked files on disk and shoving them into jars (eg bug 508421). We made jar reading as efficient as possible by cleaning up code and switching to mmap. Eventually all application data files read from disk during “normal” startup ended up in jars. Unfortunately we ended up with four jars (toolkit, chrome + 2 locale jars), which felt silly. Due to limitations in XPCOM, a lot of naked files were still read from disk on version upgrades and extension installation.
Optimization #2: One jar to rule them all
Recently Michael Wu unleashed a can of omnijar whoopass. This was a massive effort driven by Android packaging requirements. Now application startup data is always being read from one file. This implies better data locality, less seeking, less waiting. One benefit of packing files tightly is that the OS speculatively reads data from disk in chunks that are usually larger than what the application requests. This makes reading nearby files free. Unfortunately there was no good way to predict the order that files will be accessed in without actually running Firefox, so there was more room for improvement.
Optimization #3: Optimized jar layout
So now that all of our data was in one file, the next logical step was to pack it intelligently. The only way to do this is to profile Firefox startup and then order the jar according to that. Unfortunately even once one lays out all of the jar entries sequentially we were still doing our io suboptimally. This was due to the fact that the zip index (jars are zip files) is traditionally located on the end of the file. Wikipedia entry has pictures to illustrate this.
In order to maximize readahead benefits and minimize disk seeks it would be nice to have the file index in the front of the file. So I changed our zip layout from
<entry1><entry2>…<entryN><central directory><end of central directory>
<offset of the last entry read on startup><central directory><end of central directory><entry1><entry2>…<entryN><end of central directory>
So all I did was change the offset in <end of central directory> to always be 4 (it can’t be 0 because anal zip programs balk at “NULL” central directory offsets). Then I added a second identical <end of central directory> entry to keep the the rule that the central directory is always followed by one. I also used the extra space forced upon me by overly vigilant zip programs to store a number indicating how much data we can preread on startup.
This yielded a 2-3x reduction in disk io over an unoptimized omnijar. This is on top of a >30-100x reduction achieved by going from naked files to omnijar.
The downside of my interpretation of the zip spec is that some zip programs expect zip files to be more rigid than the spec allows. Older versions of Firefox, Microsoft zip support in windows, WinRAR, unix zip programs, etc accept my optimized jars. 7zip, broken antivirus (it’s a security risk to be overly picky) fail.
Trivia: this isn’t the first time we got tripped up by picky zip reading code. For example, the Android apk reader irritatingly insists at having a zip entry at byte zero of an Android package. This means that one can’t use apks to do the Android equivalent of self-extracting .exe files on Windows. Michael Wu is writing a custom library loader to deal with that 🙂
Optimization #4: More Omnijar
Feeling that omnijar wasn’t awesome enough, Michael Wu went ahead and omnijared extensions. Most extensions will no longer need to be unpacked from xpi files. This also means that extension authors can opt to use the optimized jar format above to further speed up Firefox startup.
Other jar optimizations
Switching to jars via startup cache will allow us to further optimize our first startup. There is option of halving our jar IO further by actually making use of that readahead integer I added to optimized jars.