pgo startup times with syzygy

In an effort to confirm that we do want all this syzygy goodness in our release builds, I’ve been testing out syzygy on PGO builds (since we do PGO builds on Windows for releases). After removing the PEBKAC and getting a proper PGO build–which took depressingly long–I have mixed results.

First, the good news. On my laptop (Win7, Core i7, SSD), the about:startup numbers look like this:

Version main sessionRestored firstPaint
Base PGO build 265 3152 3012
Optimized PGO build 234 2778 2653

These numbers are really encouraging; they’re actually even better than the initial numbers I posted earlier.  (Though I note that they are universally slower than the earlier numbers…hm….)

There is one curious thing about these builds, though. When you look at the page fault numbers, they suggest a much different story. The (cold) numbers are what you get from starting Firefox just after a reboot; the (warm) numbers are from a second startup after the first.

Version Hard faults (cold) Soft faults (cold) Hard faults (warm) Soft faults (warm)
Base PGO build 2507 41219 8 26100
Optimized PGO build 2264 41488 14 23017

These numbers are totally contrary to what I saw with non-PGO builds. We’re not consistently lower in the optimized build on either measure. I honestly haven’t thought very hard about what this means yet.

Anyway, that’s the good news. The bad news is that on my desktop (Win XP, Core 2 Quad, mechanical drive), the about:startup numbers look like this:

Version main sessionRestored firstPaint
Base PGO build 1516 8984 8813
Optimized PGO build 1437 9187 8828

(I don’t have the necessary profiling tools on my XP box for doing page fault analysis. I shouldn’t think they’d differ dramatically between the two systems, though.)

This is a little discouraging. The syzygy-optimized build is a little faster off the line, but gets edged out by the base build in getting to the points that actually matter. I haven’t thought terribly hard about these numbers, either. One possibility is that I did turn off the XPCOM glue preloading bits, which IIUC correctly are helpful for encouraging XP to keep its hands off your binary’s startup time. Doing that was necessary for getting postlinking to work properly. If I made that runtime-configurable, then I could run tests with the preloading enabled and see if we win there.

Bottom line: We would win on leading-edge machines, but we wouldn’t see a lot of benefit on older machines.

Also, if Microsoft would add a drop_caches lookalike, that would be fantastic.

2 comments

  1. Interesting. Do you have any insight into what syzygy is rearranging? The PGO optimizer does some reordering and code block separation, so I wonder if it and syzygy are disagreeing on something.

    Also, are you taking the optimized binary, running it through the startup profiler, and then optimizing that result?

    Finally, can I say I hate the syzygy project name? 🙂 So hard to type!

    • Nathan Froyd

      I believe syzygy just orders functions; it doesn’t try to do anything fancy with basic blocks. I was indeed taking the PGO binary, running it through syzygy’s instrumentation etc. and then rearranging that.

      The startup times are so different between what I had earlier and now that I’m attempting to bisect the month of August to see when (if?) startup time got noticeably worse.