Thoughts on killing tinderbox, foundations

May 10, 2009

Thoughts on killing tinderbox, foundations

Filed under: Mozilla — Tags: buildbot, Mozilla — Axel Hecht @ 9:45 pm

I figured it’d be a good idea to just dump my thinkings on killing tinderbox (as in the the web interface to mozilla’s builds). As just the background information grows to a lengthy post, I’ll cut them into pieces.

To point you where I’m heading, the executive summary of my thinking is:

Killing tinderbox is a webdev problem for the most part, with some chunks in IT and build/releng. The latter two should be fine to mostly do what webdev needs to get the job done.

I won’t give my complete rationale for killing tinderbox, for the most part because I’ve been thinking about that too long and have come up with too many reasons to write them down. But the most important fragments would be in …

The Rationale:

Tinderbox knows relatively little about our builds, and displays even less. The front end is hard to hack, and the back end is tied to a build model that doesn’t match that of buildbot. In particular in our move to hg, things have changed considerably in the back.

Listening in to previous discussions, there seems to be a gap between how people talk about our builds, and what our builds really are. Thus I’ll bore the few people that actually hack on our build automation for a few, and dive into …

What buildbot knows:

Buildbot, aka the software we’re using to control and run our builds, knows plenty about our builds. I’ll give a list of things that come up to my mind, with a focus on things that tinderbox wouldn’t.

Why a build is running.
- Which changes went into this build, or if it was a periodic or forced build
The steps of each build, with separate description, results, logs
- Logs have separate stdout and stderr chunks in order
A set of build properties, holding slave name, build number, revision
- The set of build properties can be amended, to hold more data. The data can be basically anything that can be pickled in python, and could be constrained to json values, or just natives thereof.
Start and end times for both the individual steps and the complete build

There are some shortages in particular when it comes down to our build setup, mostly …

What even buildbot doesn’t know:

Dependencies between builds. Buildbot has two builtin methods to run builds that depend on prior builds, but it doesn’t keep track of that relationship.

For those into schema, one possible version of that is depicted in this graph.

Then, there are things we keep …

On tinderbox, but not on buildbot:

Tree rules
- open/closed (used to be on despot for cvs/bonsai)
- sheriff
Build comments
log parsers

So much for the read-only side of life. On top of this, there are a few important things that buildbot enables us to do, which we don’t empower our community to use (at least not without a releng-sheriff around).

Buildbot can:

Trigger builds on arbitrary builders, possibly with particular properties set (the latter requires hackery).
Stop most builds while running.

Exposing these should provide a powerful tool to investigate and clear bustages.

You can get a slightly better idea of how things are looking on buildbot itself if you browse around on Chromium’s waterfall. IMHO, they share the problem of not being able to present the data they have, even though they have less platforms and trees to handle than we do. You can also see the problem of dependent test builds hanging somewhere in the air. You can also nicely see the output per step with the details they have, unconditionally though. Most of the time, you likely don’t care.

Going forward, I’ll try to wrap my head around which problems our web frontend to our builds actually needs to solve, and which routes I see to getting there.

Comments (4)

4 Comments

I think you’re thinking in the right direction. Two things I don’t see yet in any of our replacement UIs or thinking:

– A central place to set tree status and have that information pulled.

You touched on this a bit, but I’ve really been thinking about this for comm-central – we have 3 apps using one repo, and we need a central place to set tree status and notices – not just for that repo but for the apps that are there, i.e. we sometimes close TB but leave SM and calendar open.

Of course, other extensions/UIs need to be able to pull this information as well.

– An indication of how long it will be before my changeset gets built.

How long before the builds will start, how long do the builds running have left.

I realise an ETA is hard to calculate, especially with clobbers or size of patch, but it can help to know roughly when to check back on the tree, rather than being glued to refreshing the web page all the time.

Comment by Standard8 — May 11, 2009 @ 5:16 am
You forgot:

* Buildbot doesn’t have any sort of scalability; you’d better hope twistd can handle the http load.

And if it can, you’d better hope it doesn’t crap itself in a way that causes it to lose connections to the slaves… because you know what that means: failed builds!

* Buildbot has no authentication. When some kiddies decide IRC isn’t enough and it’s time to annoy everyone by constantly forcing builds, stopping them will involve blocking pages you probably care about.

Add a .htaccess file? Oh right, this is twistd, not Apache.

Not many people will defend tinderbox, but there’s a reason it’s been around for so long.

It would be good to see people address these problems, instead of handwave about why they aren’t actually problems.

It’s interesting to me that for all the buildbot fanboys, people still seem to think the waterfall is a bad way to represent build data… except, they ignore the fact that buildbot’s only representation of that data is (still themes on) a waterfall.

Comment by Preed — May 11, 2009 @ 11:48 pm
I’ll detail a bit on my thinking of what web interfaces should do in a follow up post.

I like waterfalls for particular aspects, in particular as a kind of activity monitor. But they fall short in tons of aspects, which is why they’re not the sole truth.

preed, we had a few discussions on build web interface wishlists so far, and they sadly showed that people don’t know what our builds look like. I only referred to the chromium waterfall to give those some hands on experience that we had for years now, so that they can get on the same page. I have no intent to suggest that we go for that web interface, nor its underlying arch.

Comment by Axel Hecht — May 12, 2009 @ 1:01 am
My workflow currently looks like:

{changes}
{local check}
hg commit
hg push
{remote check}
{remote build}
{d/l and check}

Tinderbox gets me the remote check and shows me the build has happened (and if anything’s gone wrong), but it’s hardly streamlined.

You’d get super-awesome bonus points if I could look at the hg changelog on the web, and click through to the results of a compile and the downloads from it. How hard that would be, I don’t know.

Comment by Mark Tyndall — May 12, 2009 @ 1:25 am