In an effort to confirm that we do want all this syzygy goodness in our release builds, I’ve been testing out syzygy on PGO builds (since we do PGO builds on Windows for releases). After removing the PEBKAC and getting a proper PGO build–which took depressingly long–I have mixed results.
First, the good news. On my laptop (Win7, Core i7, SSD), the about:startup numbers look like this:
Version | main | sessionRestored | firstPaint |
---|---|---|---|
Base PGO build | 265 | 3152 | 3012 |
Optimized PGO build | 234 | 2778 | 2653 |
These numbers are really encouraging; they’re actually even better than the initial numbers I posted earlier. (Though I note that they are universally slower than the earlier numbers…hm….)
There is one curious thing about these builds, though. When you look at the page fault numbers, they suggest a much different story. The (cold) numbers are what you get from starting Firefox just after a reboot; the (warm) numbers are from a second startup after the first.
Version | Hard faults (cold) | Soft faults (cold) | Hard faults (warm) | Soft faults (warm) |
---|---|---|---|---|
Base PGO build | 2507 | 41219 | 8 | 26100 |
Optimized PGO build | 2264 | 41488 | 14 | 23017 |
These numbers are totally contrary to what I saw with non-PGO builds. We’re not consistently lower in the optimized build on either measure. I honestly haven’t thought very hard about what this means yet.
Anyway, that’s the good news. The bad news is that on my desktop (Win XP, Core 2 Quad, mechanical drive), the about:startup numbers look like this:
Version | main | sessionRestored | firstPaint |
---|---|---|---|
Base PGO build | 1516 | 8984 | 8813 |
Optimized PGO build | 1437 | 9187 | 8828 |
(I don’t have the necessary profiling tools on my XP box for doing page fault analysis. I shouldn’t think they’d differ dramatically between the two systems, though.)
This is a little discouraging. The syzygy-optimized build is a little faster off the line, but gets edged out by the base build in getting to the points that actually matter. I haven’t thought terribly hard about these numbers, either. One possibility is that I did turn off the XPCOM glue preloading bits, which IIUC correctly are helpful for encouraging XP to keep its hands off your binary’s startup time. Doing that was necessary for getting postlinking to work properly. If I made that runtime-configurable, then I could run tests with the preloading enabled and see if we win there.
Bottom line: We would win on leading-edge machines, but we wouldn’t see a lot of benefit on older machines.
Also, if Microsoft would add a drop_caches lookalike, that would be fantastic.