The good folks at Google have written a clever tool called Syzygy, which is part of the Sawbuck project. The best summary of Syzygy comes from its design document:
Syzygy is a suite of tools to perform profile-guided, function-level reordering of 32-bit Windows PE executables, to optimize their layout improved performance, notably for improved paging patterns.
Google wrote Syzygy for use with Chrome, but the tool is equally applicable to any large application where you want to improve performance…like Firefox. In this case, we’re concerned with improving the layout of libxul, as that’s where the bulk of the Firefox code lives. Working with Syzygy involves four major steps:
- Instrumenting the application binary in question;
- Running the application to collect profile data (function addresses along with invocation time);
- Passing the profile data through an ordering generator, which comes up with the order in which functions should be laid out in the optimized binary; and finally
- Relinking the application binary using the ordering from step 3.
Step 1 is pretty easy; Firefox just needs to be compiled with Visual Studio’s /PROFILE switch to ensure that the instrumenter has all the information it needs. Steps 3 and 4 are likewise straightforward.
Step 2 appears to be the tricky part. Being good–lazy–computer programmers, the Google folks wrote a number of scripts and programs to automate this process, as well as some benchmarking infrastructure. However, the scripts are written with Chrome in mind; many places have Chrome-specific bits hardcoded. This is, of course, totally understandable, but it makes using those scripts with other programs difficult.
Over the past couple of weeks, I’ve been working at making Syzygy cooperate with Firefox. If you’re interested, you can see the modifications I’ve made in my sawbuck github project. Things are working well enough that I can now run:
Debug/py/Scripts/benchmark --user-data-dir flobbity --no-preload --no-prefetch ~/ff-build/dist/bin/firefox.exe
(The --no-{preload,prefetch} options are required to work around Chrome-specific code that didn’t seem worth ripping out; the --user-data-dir specifies what profile to use when launching Firefox.) After waiting for a minute or two, the benchmark script reports:
RESULT firefox.exe: SoftPageFaults= [23495, 32139, 23356, 23343, 23299, 23167, 23063, 23141, 23113, 23267] RESULT firefox.exe: HardPageFaults= [1158, 10, 3, 3, 4, 2, 2, 2, 3, 2]
This is for an unoptimized binary, of course. You can clearly see the OS’s page cache at work in runs after the first.
The scripts are not quite perfect yet. In particular, the call traces necessary to perform reordering don’t seem to be generated, for some peculiar reason that I haven’t ferreted out. Also, the script will indiscriminately kill any Mozilla-related apps running along with the Firefox instances being benchmarked; I couldn’t find any good way to limit the killing to windows associated with a particular profile. (IIUC the Chrome code correctly, it sets the window text of a hidden window to the full path to the profile directory in use.) But a good bit seems to work; hopefully progress will come faster now that the groundwork has been laid.