Axel Hecht Mozilla in Your Language

November 15, 2010

As sure as logs are logs

Filed under: Mozilla — Tags: , , — Axel Hecht @ 12:04 pm

… or not.

As promised, I’ll write a bit about build logs today. You’ll see what our logs are, and, to begin with, I’ll take you on a tour through buildMessage to explain how the logs we have end up being what you see served off of tinderbox.

First off, buildbot is basically the same thing as any regular gecko app, one main thread and loads of callbacks. So when reading on, all your spontanous reactions are good.

The buildMessage code does:

  1. synchronous IO to load all logs of a build into memory, basically up to some 70M
  2. synchronous string handling to paste all that data together, with some extra padding
  3. synchronous compression of the resulting string
  4. synchronous base64 encoding of the compressed string

All on the main thread, all in one go, blocking. All of that to give you a single lengthy unformatted blob of text. Why?

Because our build logs are actually not a single lengthy unformatted blob of text, which is what tinderbox wants.

Let’s have a peek into what our build logs are, really. In my previous posts, I introduced you to the concept of build steps. They’re really the basic entity of work to be done for a build. Now, the logs are stored in buildbot pretty much in how the data comes, that is, each log is associated with a step, and the storage is happening as the chunks arrive. Commonly, that’d be stdout and stderr data coming from shell commands run on the slave. The information about which stream the data is on is persisted, too, as is the order, so any log looks like this, basically:

Step reference header length data
stdout length data
stdout length data
stdout length data
stderr length data
stdout length data

As most of you aren’t among the few priviledged ones to actually look at the real logs, I’ve set up a fake log page for you to take a look. It’s an l10n repack, mostly because they’re somewhat small in both step count and log size, and because I’m used to them. Here’s the actual make step highlighted. You can see the introduction being shown in blue, which is the common color for header chunks. Buildbot just uses that channel to show setup and shutdown information on the step. Then there’s the actual make output in black. If there was something on stderr, it’d be styled in red. Sorry, I didn’t quickly come up with something that has stderr.

The first take-away is that you can get to just the build output of the step you’re interested in.

If you’re nostalgic, you can check the checkbox for tinderbox, the css style sheet changes to show you what you’d get from tinderbox. Try to find the information again?

One further detail, there can be more than one log per step. Buildsteps that set build properties quite commonly have two logs, one that keeps track of the command that got run, and another that keeps track of the actually changed build properties. You can look at an example in the builddir step. The boring last line is the second log.

Log files are really not all that complicated, and much more useful than what we get back from tinderbox. Let’s look at some of the pros:

Log files come in as the build goes. This enables buildbot to publish build logs in almost realtime. There’s little-to-no cost for that, too, a simple node.js proxy can ensure that only one log is read at any time. Another benefit is, one can archive logs incrementally, removing the current stress on the masters to publish more data than they want to chew in one go.

Log files are per task. As the logs are associated with a step, which has a name and a builder, there’s pretty rich information available on what the data in question is actually about. Think about hg-specific error parsers for one step, ftp-specific ones for the next, and mochitest-specific ones for the one after that. All in one build. If we’d archive the raw data, we can easily improve our parsers and be compatible with old logs. Or add new steps to the build process without fear to break existing log parsers.

Tinderbox can still be fed. Even if we’re not sending out tinderbox log mails from the masters, we can still do the processing out of band in an external process or even external machine, offload the masters, and not enforce us to change all infrastructure in one go.

There is a hard piece, too, storage. Build logs are plenty, and they’re anywhere from a dozen bytes to 70M. Within the same build, even. There a hundreds of thousands small files, and thousands of really large ones. I hope that adding some information on what our build logs really are helps to spike a design discussion on this. If to compress, on which level. Retention, per step type, even? Store as single files, in one dir, or in a hierarchy, or as tar balls? Or all of the above as part of retention? Is hbase a fit?

November 12, 2010

Counting sourcestamps, changes, and faking data

Filed under: Mozilla — Tags: , , — Axel Hecht @ 8:31 am

As a follow up to my previous post on my digging through our build status, I want to look with a bit more detail, pretend it’d all be simple and what it could be, and, well, add the promised chocolate to coconut. Bounty.

Let’s look at the actual data for two and half landings. First, I’ll start with a rather simple landing by roc, revision 1b43… on mozilla-central. Let me summarize the builds real quick:

1 changeset in 1 push.

27 change rows in status db.

16 different branch names.

106 sourcestamps in status db.

245 builds.

That’s a lot, because, what we’re really interested in is

1 push, 245 builds.

Talk is cheap, but what’s really cheap is manipulating other peoples database, so while Vettel was running in circles in Brazil, I was running circles in the db and manually stitched things back together. The result is still coconut, but with chocolate, so here is the same url in bounty.

1 source, 1 change, 253 builds.

Wait, what, not 245? No, 253, because, well, there are more disconnects in the status, so the query in the database doesn’t catch them all. That’s what you need manual stitching for. Also, finding the right sourcestamp to keep isn’t trivial.

Which is why I only did it for a few builds. Sorry, you’ll not find a lot of builds that are stitched together, so enjoy the guided tour through the few shiny places.

During my needlework I came across another set of changes which are worthwhile to include into today’s tour. It’s two pushes, by khuey and vlad. Let’s give Count Count a rest and look at them with chocolate right away, the builds for khuey’s revision and vlad’s. What you’ll notice is that some of the builds for khuey aren’t there, but lumped together with vlad’s. Why’s that?

Our build infrastructure really doesn’t know about pushes. As I’ve detailed in my previous post, there are sourcestamps and changes, but no further grouping. At this point it’s really a design decision on whether the buildbot changes are hg pushes or hg changesets. This decision is currently in favor of hg changesets, which may just be right. At that point, the scheduling logic that puts changes into builds needs to put extra care into creating sourcestamps for what we intend to get builds for, and to keep those sourcestamps apart. The current implementation puts anything coming in within three minutes into the same sourcestamp, which is somewhat of a load limiter.

Anyway, back to chocolate. When you looked at the pages, did you realize that once you add it, you’re almost at a tinderboxpushlog page? Right, it could be that “easy”.

What’s between reality and chocolate? Well, sendchange. That’s a buildbot architecture component that allows shell commands to insert changes into buildmaster. It’s rather limited in the data it can transport, which is why we loose data on the way. There’s an alternative feature called trigger, which doesn’t. There is an open ticket to make that span different buildmasters, but given how much Mozilla invested into schedulerdb, let’s pray it’s easy to fix. Filed bug 611670.

Update: Changed links to l10n community server.

November 10, 2010

Looking at the internals of our builds

Filed under: Mozilla — Tags: , , — Axel Hecht @ 9:34 am

Chris Atlee has put up database dumps of both the scheduler and the status databases. These databases are the most detailed and (almost, status db is not) first class information on what our builds are really doing. The current code on top of that is all pylons-based, and I am, as many other mozillians, part of the django shop. So for one I figured “let’s see if I can make django read this”.

I can. It’s a bit rough, though, as some of the tables for many-to-many relationships don’t contain a primary key column. Django really doesn’t like that, and thus there are some things that you cannot do without those columns. Most of it works just fine, though. This holds at least for the status db, which I looked at in more detail, but the scheduler db ain’t much different. Filed bug 611014.

The code for this is up at django_moz_status on github.

Of course, me being able to talk to the db with a python shell won’t help you much, right? So I’ve spent a few more hours to actually create a really rough website on top of it, which I want to share with you.

Coconut. Hard to open, and once you get there, you may not like it. I have a thing with project names.

Coconut is a bunch of django views on top of django_moz_status, held together by site_demo. You can see it in action (for now) on the l10n community server. This view is exposing three concepts of buildbot, and how they play together:

Sourcestamps (first column): Every build in buildbot has a sourcestamp, and a sourcestamp can have multiple builds. The sourcestamp knows a branch, a revision, and a list of changes going into the builds associated with it.

Changes (second column): Changes are the external “real life” events that may or may not trigger builds. In this view, you see a few list of changes that look like a push to hg (and are just that), as well as a plethora of changes by Mr. sendchange, and Mr. sendchange-unittest. If you remove some query params on the above URL, you can also see a bunch of sourcestamps without changes. Those are nightlies.

Builds (last column): Each sourcestamp can have multiple builds, I’m just showing the builder name (a symbolic short name), the buildnumber, and the result as color. The third column is actually a guess on the platform of the build, based on the platform build property. If that’s not set, unknown is used.

Which brings me nicely to another two pieces of our build infrastructure that has been hard to look at so far. Build steps and build properties. Surf along? Let’s look at a build.

The first section of the build pages shows some general information, builder, buildnumber, status. The start and end time, how long the build took. Also it lists the buildbot master, and the category of the builder. Categories allow to filter for builds, sadly, a builder can only be in one category.

Next up is build steps. Each step in a build is an item of work, and an entity of failure. Different steps can handle failures differently, too. You can see that the build starts with a flock of steps that do administrative tasks on the slave. You can see which fragment of time of the build that step took by looking at the bar in the second column. You’ll see that the majority of that build went into the compile step. And that that passed. And after that compile, there’s a bunch of adminstrative stuff again.

There are two things that you do not see. One is, each of those steps has build logs attached to them. Commonly one, but possibly more. I’ll talk more about logs in a different post. The other thing is, steps can change the build properties. Which is to say, the build properties which are shown at the end of the build page are not static, but change during the build run.

Build properties? Right, within buildbot, several objects can have properties, among them, builds. You’ll find things like the buildnumber, the slavename, the branch, buildername (pretty). You’ll also find a host of things around the packaging of the build. Quite generally, our build try to put things that are needed for the build itself or for logic around into build properties.

The end of the page is reiterating which changes are associated with the sourcestamp for this build.

Let me stress your patience once more and invite you to visit a build with a failed step. In this build page you can see how the clobber step worked fine, and took quite some time, and the actual status of the build is due to the actual test step failing with a warning.

Now this post is already pretty lengthy, so I’ll take a break here and invite you to go in and click back and forth a bit, and to do some url hacking. If you think this is rough and you’re having a hard time figuring out what’s why, I’ll do a follow up post on how to add chocolate to coconut.

PS: the database this instance is working on is a snapshot that ends in August, details may be different today. I shrunk the database, too, only the last 10k builds still have the buildsteps.

Update: Changed the links to the l10n community server.

November 6, 2009

PS: l10n-merge

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 8:02 am

Armen just blogged about this, and as it’s constantly mentioned around l10n, I wanted to add a bit more detail to l10n-merge.

l10n-merge is originally an idea by our Japanese localizer dynamis. The current implementation used in the builds is by me, integrated as an option to compare-locales. There are spin-offs of that algorithm in the silme library, too.

l10n-merge attempts to solve one reason for “yellow screens of death”, i.e., XML parsing errors triggered by incomplete localizations. This is really crucial as localizations don’t just pop up by swinging magic wands, they’re incremental work, and a huge chunk of that. So in order to test your work, you need to see the strings you have in, say, Firefox, without having the other 4000 strings done yet. Other l10n-infrastructures handle this by falling back to the original language at runtime (gettext), but doing that at runtime of course has perf impact, and size. l10n-merge does the same thing at compile (repackaging) time.

Design goals for l10n-merge were:

  • not mess with any source repositories
  • not do any file-io that’s not really needed

Thus, in order to not mess with the source repos, l10n-merge doesn’t modify the sources inline, but creates copies of the files it touches in a separate dir. Commonly, we’re using ‘merged‘ in the build dir. Now, creating a full copy of everything would be tons of file io, so l10n-merge only creates copies for those files which actually need to get entities added to existing localized content. This plays together with code in JarMaker.py which is able to pick up locale chrome content from several source dirs.

A Firefox localization contains some 450 files, and say for the current 9 B1-to-B2 missing strings in two files, it would copy over those two files from l10n, and add the missing entities to the end. Then JarMaker is called with the right options, and for those two files, will pick them up from merged, the rest of the localization is gotten from l10n. For missing files, it actually looks into the en-US sources, too, so we don’t have to do anything for those. To give an example, for chrome/browser/foo in the browser ‘module’, it searches:

  1. .../merged/browser/chrome/foo
  2. l10n/ab-CD/browser/chrome/foo
  3. mozilla/browser/locales/en-US/chrome/foo

Now it’s time to list some pitfalls that come with l10n-merge:

  • If you’re passing the wrong dir for mergedir, nothing breaks. All build logic breakage would come from missing files, and due to the fallback to en-US, there are no missing files.
  • l10n-merge, as compare-locales, doesn’t cover XML parsing errors inside entity values yet. Bug 504339 is filed, there are some tricky questions on reporting, as well as having to write an XML parser from scratch.
  • l10n-merge only appends entities, but that’s fine 95% of the time. Only counter-examples are DTDs including other DTDs.
  • People using l10n-merge need to manually maintain the merge dir. Pruning it via compare-locales is risky business if you specify the wrong path by accident, so I consider this a feature. But if you’re seeing Spanish in a French build, clobber the mergedir and build again :-)

July 12, 2009

oh my eyes, but good still

Filed under: Mozilla — Tags: , , — Axel Hecht @ 2:19 pm

Builds for a changeI’ve uploaded a snapshot of a build in progress on my test display which gives a bit better insight on what’s possible to show about a build based on the information that buildbot has, or a build database had. The important pieces here compared to tinderbox are:

  • Builds in progress are associated with the check-in that triggered the build.
  • Builds in progress show individual build steps.
  • Finished builds are displayed in compact form. In this case, all builds end with warnings, and thus come back in a shade of orange.
  • Builds not yet started that are already requested are displayed on top.

I didn’t go into the detail of mentioning which builders are having pending builds. This is mostly me waiting for django 1.1 and aggregation support, but in the end it’s as simple as a GROUP BY. Nor did I try to make that display visually pretty, hell no.

In the context of our regular builds, it’s worthwhile mentioning what unit test and talos runs would look like, i.e., “builds” that are scheduled after the binary builds are done. Those wouldn’t show up until they’re actually scheduled, which is fair enough, as that is what’s actually happening. If a windows build fails, there won’t be unit tests nor talos builds run. You wouldn’t end up in a situation where you think you’re done and you aren’t, though. (Talos not working this way aside, that needs thought due to different masters etc.) The actual builders don’t finish until they triggered the spin-off runs, so you either see the binary build as still running, or you see the triggered runs as pending. Here, as soon as you don’t have anything running or pending and your boxens are green, you’re off the hook.

I also added microsummaries and RSS feeds for this view, so you can use the web to learn about your the fire you lit up.

June 10, 2009

Running into builds, just testing

Filed under: L10n,Mozilla — Tags: , , , — Axel Hecht @ 10:29 am

I’ve blogged previously on how to set up a staging environment to test the l10n build system, but I didn’t go into any detail on how to actually do builds in that set up. That shall be fixed.

I’m picking you up at the point where

twistd get-pushes -n -t 1 -s l10n_site.settings

is running stable. You probably want to tail -f twistd.log to follow what it’s doing. This piece is going to watch the hg repositories in stage/repos and feed all the pushes to those into the db. Next is to make sure that you actually get builds.

The first thing you need to do is to configure the l10n-master to access the same database as the django-site. Just make sure that DATABASE_* settings in l10n-master/settings.py and django-site/l10n_site/settings.py match. The other setting to sync is REPOSITORY_BASE, which needs to match in both settings.pys. I suggest setting that to an empy dir next to the l10n-master. I wouldn’t use the stage/repos dir, mostly because I didn’t test that. Now you set up the master for buildbot, by running

buildbot create-master .

inside the l10n-master clone. Next thing is to create a slave dir, which is well put next to the l10n-master. Despite what buildbot usually likes, this slave should be on the same machine that the rest is running on.

mkdir slave
buildbot create-slave slave localhost:9021 cs pwd

So much for the general setup. With the twistd daemon running to get the pushes, you can now switch on the master and the slave, by doing a buildbot start . in both dirs. I suggest that you keep a tail -f twistd.log running on the master. If you decide to set things up to track the upstream repositories, I start the master, then the slave, and if both are up fine, I start the twistd daemon for get-pushes.

Now let’s trigger some builds:

Open stage/workdir/mozilla/browser/locales/en-US/file.properties in your favorite editor, and do a modification. I suggest to just do a whitespace edit, or to change the value of the existing entity, as that is going to keep your localizations green. Check the change in to the working clone, and then push. The get-pushes output should show that it got a new push, and then on the next cycle, feed the push into the database. You’ll notice by the output of a hg pull showing up in the log. On the next poll on the l10n-master, builds for all 4 locales should trigger. You should see an update of four builds on the waterfall, and 4 locales on the test tree on the local dashboard.

June 5, 2009

got l10n builds

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 3:22 am

’nuff said? Not even remotely.

We’ve had l10n builds as long as I’m working on l10n, actually, I got involved around the time when we started to do them upstream. They always were considerably better than each localizer doing their build at home on whatever (virus-infected) hardware they found, with help from other community members for the platforms they didn’t have. But at the light of day, it was more

That? Yeah, I know. That’s crap.

And I know you can hear my voice in your head right now :-).

Those days are gone. We’re running Firefox and Fennec builds on the releng infrastructure now for a few days that are actually sound builds made to service our l10n community. Some highlights:

  • Builds are finished some 10 minutes after a localizer landing, on all three platforms.
  • There’s no deadlock between different locales, thanks to all l10n builds running on a pool of slaves.
  • Builds are “l10n-merged”, against the actual build that’s repackaged. Independent of missing strings or files, you have a build that can be tested.
  • No more race conditions between nightly and trunk source status.

The impact of this shouldn’t be under-estimated. We are, for the first time in years, producing builds that allow a localizer to actually immediately test. Localizers can work incrementally, translate one feature, check-in, test. No worries if something landed in en-US in the meantime, or whatnot. With the new builds, I have seen various localizers coming from hundreds of missing strings to a tested build on two or three platforms in a matter of a few hours. Back in the days, that was the waiting time for the first build. The new locales all pull all-nighters to get their final bits in. They want to, and now they actually can.

I want to thank coop and armenzg for their great help in making this happen, aki for porting it over to fennec. Of course thanks go to joduinn and sethb, too, for bearing with the ongoing meetings we have, trying to battle the crap down. To dynamis for the initial work on l10n-merge. Also thanks to bsmedberg and Chase for the initial works on both automation and build process, and ted for the various reviews on making our build system catch up.

Finally, we’re not going to stop here. Armen is working on creating the necessary files to get l10n builds on a nightly update channel. Yep, you heard right, that’s where we are right now. I know that KaiRo is working on getting the goodness over to the comm-central apps. And yours truly is hacking on the dashboard together with gandalf, more on that in a different post.

June 1, 2009

L10n ecosystem in a fishbowl

Filed under: L10n,Mozilla — Tags: , , , — Axel Hecht @ 7:02 am

Building the infrastructure for our l10n builds is hard, mostly because it’s consisting of a ton of things that you don’t have control over. We’re building 3 and a half applications, Firefox, Thunderbird, Fennec, and Sunbird for calendar. Firefox is built on three versions, one of which is still coming from CVS. Thunderbird is one version on CVS, one on hg. We’re touching some 170 hg repos, and a single check-in can do anything between no build, one build, or up to almost 200 builds. Rinse, repeat, yeah, 200 builds for a single landing. Worse than that, you don’t have any control over who’s landing what when where. Bottom line, you really can’t test a change in the l10n build automation reliably in our production setup.

You can create a fake ecosystem, though, and I’ll explain a bit how that works. Of course it doesn’t end up being trivial, but it’s contained. It’s not trying to cover the CVS branches, those would require a setup of bonsai, which I chicken away from. Take this post with a grain of salt, I assume there are some errors here as most of it is typed from memory.

As with any recipe, here are the list of ingredients:

  • A set of hg repositories, both for a fake application and some fake l10n repos.
  • An hg server serving those repositories (make that port 8001).
  • Some buildbot infrastructure working on top of these repositories that you’re trying to test.
  • Possibly an instance of the l10n dashboard presenting both the build and the l10n data.

The initial chunk is creating the repositories. I created a helper script create-stage that does that, which is part of buildbotcustom/bin/l10n-stage. It’s main purpose is to get the templates, hooks etc that are part of our server-side setup on hg.mozilla.org, create some upstream repos for en-US and l10n, and push some initial content from a set of working clones. You call it like

python l10n-stage -l stagedir

The -l keeps the l10n repositories from pushing their initial content, which yields a scenario that is closer to what we have upstream, i.e., a flock of en-US pushes before the l10n repos start. This command creates a bunch of main repositories in the repos subdir of stagedir, and a bunch of working clones in workdir. It also creates a webdir.conf, that you’ll use to run the local hg http server. Let’s run that now, in stagedir:

hg serve -p 8001 --webdir-conf=webdir.conf -t repos/hg_templates

Now you have a local setup of a application repository called mozilla, and 4 localizations in l10n, ab, de, ja-JP-mac, x-testing. They’re all equipped with the same hooks that we run on hg.m.o, in particular, they support pushlog.

Now on to the buildbot infrastructure. There’s a sibling script to create-stage, create-buildbot, which should create a master setup that is rather close to what we run on releng. It supports various degrees of parallelism for multiple slaves on three platforms, does only dummy builds, though. IIRC. I want to go into more detail on how to set up the new dashboard master, though.

The dashboard master is merely running compare-locales on the actual source repositories. It does come with our bonsai replacement pushes, though. That’s basically a pushlog spanning repositories, including file and branch indexing. Here’s the basic software components you’ll need:

  • django 1.0.x (1.1pre might work, too)
  • buildbot 0.7.10p1, older versions won’t work

and from hg.m.o, you’ll need compare-locales, locale-inspector, l10n-master, buildbotcustom and django-site.

Firstly, you set up the db. sqlite and mysql should both work, mysql is actively tested. Edit the various settings.py files to reference your db, with an absolute path if sqlite, and create the schema. The main entry point to the django site is l10n_site, go in there to run python manage.py syncdb. Another edit you want to do is to point REPOSITORY_BASE to a dir where you can stage another set of clones of your repos. I suggest to not share the hg master repo dir here.

Next, create a buildbot master and a slave. You do that by running the buildbot create-master command on your local clone of l10n-master. You’ll need to adapt l10nbuilds.ini to the test set up,

[test]
app = browser
type = hg
locales = all
mozilla = mozilla
l10n = l10n
repo = http://localhost:8001/
l10n.ini = browser/locales/l10n.ini
builders = compare

I should put that into the repo somewhere, I guess.

Setting up the slave is trivial, you need to make sure it’s on the same machine, though. It will run on the django clones directly.

Before starting master and slave, make sure that all the deps are in your PYTHONPATH.

Last but not least, you need to get all the pushes from your repo setup into your django setup. First, you need to tell the db which repositories it’s supposed to get from where. I created sample data for the test app, which you can load by

python manage.py loaddata localhost

The repositories I use for the production environment are in hg_mozilla_org, fwiw. You fill the database with the actual push data by running a twisted daemon. Inside django-site, run

twistd get-pushes -n -t 1 -s l10n_site.settings

That will ping one repo per second, not update, with data from l10n_site.settings. Now you have everything set up, and you can start to edit en-US and l10n files in your workingdir, and push, and see how that changes your builds and dashboard.

The buildbot waterfall will be on port 8364, and with python manage.py runserver, the dashboard will show up on port 8000. None of this is setup to be on a production server at this point, but it’s good for testing.

Update: Forgot to mention that you need to bootstrap the repo lists.

December 19, 2008

working demo, waterfall and more

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 4:50 pm

Thanks to Reed, I just put up my latest pet project up on the l10n server.

It’s offering a new web interface for buildbot builds. It does so by feeding the status data that buildbot has into a database on the one end, and on the other end is a django app displaying that data.

The nice thing about this is that writing features (or fixing bugs) is “just webdev”. Compared to whatever you want to call tinderbox hacking.

There are already a few concepts demoed on the site. All urls are in flux, note the “stage” in them. But the principle should be obvious.

Firstly, you can get to a regular waterfall on waterfall. Yes, there are some time-sorting issues there. But it’s quick, which is cool. Compare it to the regular buildbot waterfall (didn’t bother checking which timerange that shows). And it offers a nice compromise (IMHO) for displaying detail or not. For finished builds, it shows one box per build. For builds in progress, it shows a box per step (it doesn’t show a box for the build for those steps, which is confusing). It has a blame column, too. Whenever you see a change number, that links to a new page, which lists all builds for that particular change. This one shows an en-US check-in with all locales turning red, for example. Another one just shows how things go green for Arabic again, as that localizer checked in.

For the l10n builds, that’s peanuts, but if you’re landing on a tree with real compiles, being able to follow all builds for your landing, and no others sounds cool.

And django comes with helpers for generating feeds, so creating a meaningful live bookmark to follow your own landing doesn’t seem like an unsolvable RFE.

There’s more. You can restrict the waterfall to only show builds for a particular slave. You can restrict the shown builds to only show builds with a particular property, say, the Macedonian builds, compared to all builds.

There’s obviously lots of room for improvement, the code is in the tinder app in my hg repo. Volunteers welcome.

October 21, 2008

l10n-merged linux builds on the l10n server

Filed under: L10n,Mozilla — Tags: , , , — Axel Hecht @ 2:43 pm

I reached another milestone on the l10n builds on the l10n server – reliable l10n depend builds.

A short recap on why they could not be reliable. Details are in Armen’s and John’s presentation in Whistler. First and foremost, l10n builds with missing strings break. They might start, or not, maybe even crash. Or just display the yellow screen of xml parsing death. Now, l10n builds are not really builds, but repackages of an en-US build. Between the time that the en-US build started, or, in hg, the revision it used, and the tip at the time when the en-US binary is finished and available, there can be further l10n-impact landings. We are using the nightly builds for the repackages throughout the whole day even, so the chance that the current en-US source doesn’t correspond to the nightly increases. So even if you know that a localization is good tip vs tip, you can’t say if it’s breaking the previous nightly or not. Huh? Look at the graphs in Armen’s and John’s presentation for more arrows going back and forth in time. ;-)

Enter bugs 452426 and 458014. 452426 added the changeset id to application.ini (thanks Ted), and 458014 refactored browser/locales/Makefile.in with additional logic to extract that info for the build system. I got that one landed yesterday, so we can now get the source stamp of mozilla-central for a firefox build.

Right, good catch, this doesn’t work for comm-central builds. I’ll leave it up to them to figure out how to reproduce the plethora of repos they have.

So far, so good. You download the nightly, unpack, ident (the rule to extract the changeset id). Now you back to your source tree and hg update to that revision, and run compare-locales against that. We’d be able to reliably say “works” or “better don’t touch”.

We promised more, and more pieces came together today.

With reliable compare-locales code, one can not only detect missing strings, but also add missing strings to files. Think about a CPP step, nothing permanent, nothing gets landed upstream. But just for the needs of this particular build, you’d have something that has all strings. Not all translated, some padded from en-US. That works. compare-locales is already able to do merges for a while now, storing the changed files into a separate location. Mostly because I consider changing the source to be evil. So what about missing files? Nothing. Good files? Nothing. How does the build pick up files from merges, l10n, and en-US then?

By rewriting make-jars.pl, enter JarMaker.py. Among overall readibility improvements and removing XPFE hacks, JarMaker.py offers to pick up l10n files from a list of top-level source dirs.  It offers another cute feature, by writing both chrome and extension manifests at once. Now, with bug 458014, we don’t have to run the libs phase for installers and langpack separately. (I never got why we do that until I rewrote make-jars.pl, actually.) The rewrite of JarMaker.py was preceeded by rewriting Preprocessor.py, so that all of the jar generation can happen in a single python process.

Starting from today, all of this came together with my installation of buildbot on the l10n server.

This gives us

  • builds on push, i.e., feedback within 5-10 minutes (real stats pending)
  • comparisons of the l10n tip against both
    • the en-US tip (for the upcoming nightly)
    • changeset of the previous (for the existing nightly, with l10n-merge)
  • html-ified output for both of those
  • updates for the dashboard

and last, but not at all least, a

  • working build, even for partial translations.

Find 60 3.1b2pre linux builds on the l10n server.

Thanks to Armen, I used a few of his new makefile targets for download and upload, he did a bunch of work for the sourcestamps-in-application.ini on cvs, too. Thanks to Ted, the poor fellow had to review all my rewrites and Makefile dependencies foo, and did some patches, too. hgpoller stuff not to forget.

TODO:

  • silme will offer even more reliable merges
  • nightly scheduler for all locales (currently I only build on l10n and en-US l10n-impact changes) (*)
  • mar’s
  • comm-central
  • more Makefile foo to pick up more missing files from en-US in doubt… (*)
  • … or at least document the core set of required files (*)

I won’t take most of those, fwiw. Possibly only the (*) ones.

Sources are in my tooling repository, and there’s an updated version of compare-locales, 0.6, on pypi. No drastic changes here, just some paths fixes, mostly for Windows.

Older Posts »

Powered by WordPress