14
Sep 10

Firefox 4: jar jar jar

Opening files is relatively expensive. There is a small syscall overhead and a higher overhead of fetching data from disk. Depending on physical data layout and disk type, this can leave modern CPUs twiddling their thumbs for a long time while the disk skips around fetching all of the different file pieces.

Optimization #1: Fewer naked files

About two years ago I started gathering naked files on disk and shoving them into jars (eg bug 508421). We made jar reading as efficient as possible by cleaning up code and switching to mmap. Eventually all application data files read from disk during “normal” startup ended up in jars. Unfortunately we ended up with four jars (toolkit, chrome + 2 locale jars), which felt silly. Due to limitations in XPCOM, a lot of naked files were still read from disk on version upgrades and extension installation.

Optimization #2: One jar to rule them all

Recently Michael Wu unleashed a can of omnijar whoopass. This was a massive effort driven by Android packaging requirements. Now application startup data is always being read from one file. This implies better data locality, less seeking, less waiting. One benefit of packing files tightly is that the OS speculatively reads data from disk in chunks that are usually larger than what the application requests. This makes reading nearby files free. Unfortunately there was no good way to predict the order that files will be accessed in without actually running Firefox, so there was more room for improvement.

Optimization #3: Optimized jar layout

So now that all of our data was in one file, the next logical step was to pack it intelligently. The only way to do this is to profile Firefox startup and then order the jar according to that. Unfortunately even once one lays out all of the jar entries sequentially we were still doing our io suboptimally. This was due to the fact that the zip index (jars are zip files) is traditionally located on the end of the file. Wikipedia entry has pictures to illustrate this.

In order to maximize readahead benefits and minimize disk seeks it would be nice to have the file index in the front of the file. So I changed our zip layout from

<entry1><entry2>…<entryN><central directory><end of central directory>

to

<offset of the last entry read on startup><central directory><end of central directory><entry1><entry2>…<entryN><end of central directory>

So all I did was change the offset in <end of central directory> to always be 4 (it can’t be 0 because anal zip programs balk at “NULL” central directory offsets). Then I added a second identical <end of central directory> entry to keep the the rule that the central directory is always followed by one. I also used the extra space forced upon me by overly vigilant zip programs to store a number indicating how much data we can preread on startup.

This yielded a 2-3x reduction in disk io over an unoptimized omnijar. This is on top of a >30-100x reduction achieved by going from naked files to omnijar.

The downside of my interpretation of the zip spec is that some zip programs expect zip files to be more rigid than the spec allows. Older versions of Firefox, Microsoft zip support in windows, WinRAR, unix zip programs, etc accept my optimized jars. 7zip, broken antivirus (it’s a security risk to be overly picky) fail.

Trivia: this isn’t the first time we got tripped up by picky zip reading code. For example, the Android apk reader irritatingly insists at having a zip entry at byte zero of an Android package. This means that one can’t use apks to do the Android equivalent of self-extracting .exe files on Windows. Michael Wu is writing a custom library loader to deal with that :)

Optimization #4: More Omnijar

Feeling that omnijar wasn’t awesome enough, Michael Wu went ahead and omnijared extensions. Most extensions will no longer need to be unpacked from xpi files. This also means that extension authors can opt to use the optimized jar format above to further speed up Firefox startup.

Other jar optimizations

Switching to jars via startup cache will allow us to further optimize our first startup. There is option of halving our jar IO further by actually making use of that readahead integer I added to optimized jars.


09
Sep 10

Help Wanted: Does fcntl(F_PREALLOCATE) Work as Advertised on OSX?

To fight fragmentation it is best to tell the OS to allocate a continuous chunk of space for your file. With specialized APIs, the OS can do this without performing any IO (not counting metadata). I am adding support for this as part of bug 592520.  Linux features posix_fadvise for preallocating files. Windows’s SetEndOfFile achieves the same result. Supposedly OSX can do this via fcntl(F_PREALLOCATE), but does it?

I’ve experimented with posix_fadvise/SetEndOfFile and determined that they both change the file size and do their best to avoid fragmentation. Unfortunately I do not see any effect of fcntl(F_PREALLOCATE) on OS X 10.6 (the return code is successful). The file size does not change and if I then write to the file, it seems to fragment just as much as before. Can a Mac expert demonstrate that fcntl(F_PREALLOCATE) makes any difference at all?

Update: Thanks a lot for the useful feedback, it was extremely helpful in producing this patch. It appears that the posix_fallocate equivalent is to the fnctl followed by a truncate() call (which actually forces data to be written to the file).


07
Sep 10

Fighting fragmentation: SQLite

Thanks for all of those who commented on previous post on fragmentation. My first fragmentation fix has landed. In current nightlies and future releases the main Firefox databases will grow more aggressively to avoid fragmentation. This should translate into better history/awesomebar/cookie performance for our most dedicated users.

Unfortunately fixing existing profiles is hard from within Firefox. In the meantime advanced users on non-Windows platforms who are suffering from fragmentation can manually copy *.sqlite files to another directory and back.

Windows: Ahead of the pack

Evidence suggests that the Windows fragmentation situation is slightly better than on other platforms. Firefox fragmentation behavior on Windows is similar to other OSes but Windows periodically defragments Firefox files opened on startup. So one ends up with a cycle of deteriorating performance, followed by better performance(ie right after defrag), followed by deteriorating performance, etc.

I haven’t observed Windows defragmenting files for me, but it seems to do this for most users. Would love to learn more on how/when it decides to defragment files.

Horror Stories

I found a few other places that are horridly affected by fragmentation, will be blogging about those as I fix them. Fragmentation is an interesting problem to optimize because it affects dedicated users most, yet it is very tricky to replicate in a developer environment. Furthermore, there are a lot of misconceptions floating around:

  1. Fragmentation is a Windows problem that Linux is immune to due to having awesomer filesystems.
  2. Mac OSX automatically defragments files, so fragmentation isn’t a problem there.
  3. Fragmentation isn’t a problem on SSDs

To which I say:

  1. Linux might be good at avoiding fragmentation for server workloads. It sucks for desktop users.
  2. OSX will defragment small files, but big ones hurt most.
  3. Cheap SSDs suck at tiny reads caused by fragmentation resulting in spectacularly bad IO. More on this in a future post.

To summarize: there are a lot of misleading stories floating around. I am always happy to hear more measurements/docs/bugs/etc on this subject, but I have zero patience for folk stories and speculation.

I should also mention that the fragmentation problem isn’t limited to Firefox. Other browsers suffer from it too.