Axel Hecht Mozilla in Your Language

February 7, 2008

Builds, shuttle busses and cabs

Filed under: Mozilla — Axel Hecht @ 5:52 pm

There have been a few blog posts recently about when to do builds. I’d like to add a few thoughts of mine.

I’ll follow mostly two trains of thought
* each check-in raises a bunch of questions, to which we want answers — quick
* the road to these answers has a limited traffic capacity

The current proposals map to two images in my head, the current “build continuously” model is basically a shuttle bus. “Build on check-in” is more like a cab. Now, the interesting artifact in our picture is that both the shuttle bus and the cab have an unlimited capacity for passengers, or check-ins. Do check-ins blend? Yes, they do.

Now, the blending of patches has an upside and a downside. On the upside, it enables us to get around traffic jams. We can just transport as many check-ins as we get. The downside is, the answers that the builds and test runs give can’t be associated with individual check-ins anymore. Well, “passes” can, “failures” cannot. I’ll postpone perf testing here, 10% win, 9% loss, end up where you were.

There have been previous posts on whether shuttle buses or cabs are the way to go, and my answer is “neither”. I guess there is an easy answer if you assume that you have no limits on machines, in that case, just let them run on check-in. That’s great — at least as long as you can actually relate the resulting build, and the tests run on that build, to a source tree. Once we’re running out of machines, the story is a little different. Every machine should be continuously building then, and the trick is, ‘by then’. That is, each build should adjust the time it’s waiting for more check-ins such that, by the time the last idle machine kicks off, all available machine are fairly distributed across the ETA of the next machine. Let’s pick some arbitrary number for an initial stabilization time, Tmin. Waiting time for machine n of N could then be

Tmin * (N-n)/(N-1) + ETA/2 * (n-1)/(N-1)

if we choose to weight linearly. I did a little scatterplot game for you to pick different amounts of slaves, mintimes, etas and such. I bet there is a way to pick better values for Tmin and the power based on bonsai and tinderbox statistics.

Sadly, neither tinderbox nor buildbot offer this, but I could imagine that this would be of more general use to buildbot clients, and would be something to get upstream.

The other part of the picture is “did this particular change impact ???”. Now, for questions like “does this compile?”, the answer is fairly trivial. I think the same goes for things like unit tests or ref tests. As long as the current state of the tree passes, we’re fine. When it fails, that’s a more interesting question. Like, it might make sense to then actually refine the built source stamps.

But for questions regarding perfomance, there are worst case scenarios. Like, you see a 1% regression. Is it a 1% regression from patch 2, or is patch 2 actually improving performance, but patch 1 just totally borked it? On top of that, performance data is noisy data. There’s likely a good heuristic algorithm to distribute sampling of builds to measure performance on based on total count of tests run per build, age of the build, and current noise on the performance data for that build. So in particular for performance testing, it would be interesting to not just build the latest well-defined source state (thanks cvs), but also to be able to build previously not built source states, and in the performance architecture, to spend the available cycles to further refine the statistics on a range of recent builds, instead of just the latest. And to, of course, relate those data points with the source stamp that correlates to the build that was tested right now.

2 Comments

  1. So what doing one build/test cycle per checkin, and not “blending” patches?

    Pretty sure we have the capacity to allocate a set of slaves to every single checkin we get, without making anyone wait in line.

    This is how the tryserver works now in fact, but we haven’t given it many slaves and we’re not using buildbot’s Source class (it can be told that checkins are unmergable for instance) so we could do a cleaner solution.

    It seems like this would enable the kind of things you are considering in the rest of the post, no?

    Comment by Rob Helmer — February 7, 2008 @ 7:45 pm

  2. I somehow don’t believe that we should have enough slaves to catch up with Reed’s check-in parties :-)

    I’m really not sure if it’s cost-effective to beef up the infrastructure for the maximum traffic, and even if we do, going to the limit of that should be part of the plan.

    I think the treat is to not blend patches, yes. The bigger treat would be to not compile the non-merged source stamps in chronological order, but latest-first, and to put older source stamps onto some kind of idle scheduler.

    Wow, that’d be horrible for a waterfall display, but there’s probably a good chance to find something to make up for that.

    Comment by Axel Hecht — February 8, 2008 @ 8:51 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress