Axel Hecht

May 30, 2011

Coming soon: cross-repository network graphs for hg.m.o

Filed under: Mozilla — Tags: protovis — Axel Hecht @ 8:18 am

Did you miss the ability to look at our source code and figure out who’s working on what where? Thought that only github can do that?

Let me give you a sneak peek at what I’ve been hacking on over the weekend.

What’s the big picture of our mainline code development?

You can see the blue line of development along mozilla-central, and then branch points for the release branches, and now for beta (miramar), and aurora. That’s pretty broad, and intentionally so. If you’re interested in the back and forth on a changeset level, this is how the branch point of fx5 beta looks:

Why does it say aurora? Because hg doesn’t know what you’re looking for. I determine what was branched off of what by looking at pushes, wherever a particular changeset was pushed first, wins. As beta branched off of aurora later, you see that this was the branch of aurora at the time.

Sadly, you don’t see the more interesting detail, that after that branch, the developer tools branch merged. If the database in the backend was tracking our project repos, you’d see that.

Of course, there’s also an l10n side to this. I get many questions on what to merge and where and so, and it’s hard to see what ended up in which repo, and if things are different. Let’s follow one of the l10n repos at large:

There’s even more details on the rapid branches for this one, like so:

Many of the same landings, but with different hg changesets, and then a merge. That merge didn’t make it to miramar, though. Which is OK, that’s a one-off anyway.

Now that I wetted your appetite, bad news: Nothing of this is live yet. I’ll actually need to make a patch at least for the API in bug 659260, get review and get it live. Also, this is currently code that’s run as part of the l10n dashboard, which isn’t really intended to cover all our sources. If we want this at large, it’ll need liberation, and the resources that comes with that. The actual code is pretty easy to liberate, though.

And, graphs are made with protovis, including a custom Network-based layout to do DAGs.

Comments (2)

May 10, 2011

Reviewing sign-offs, slightly different

Filed under: L10n,Mozilla — Tags: L10n — Axel Hecht @ 3:39 am

Opening the magic box of l10n admin stuff:

We’re doing sign-offs, y’know? Localizers hit the l10n dashboard and click a button to say “this revision is good to ship”. Which is cool, because then they don’t need approval for every patch for release branch fixes.

And I’m reviewing the signoffs. Sounds all good, and well proven.

Enter the new release cycle. What’s new? This is a small update, on a quick turnaround. So I can’t do what I did for previous releases, and just not review the first sign-off. That was just an early beta (for most locales) and had a 1000 new strings. Sounded fair. Anyway, now we’re just doing 30 strings, so doing an incremental review against what’s on 4.0.1 is in order. So what does that mean?

I need to get the revision that we’re shipping on 4.0.1. For sign-offs that update one branch, that’s all hooked up in the UI, but not for 4.0.x to 5. That’s bug 655943.
The revision on 4.0.1 is on l10n-mozilla-2.0, the signoff on 5 is on a revision of l10n/mozilla-aurora. Neither need to exist in the other repo, so you can’t just use plain hg commands on a repo. That’s bug 655942.

Now, I wouldn’t be me if I wouldn’t script myself out of it, here’s the gist of it. And yes, this blog post is as much code comments as there are for that one.

Comments Off

April 20, 2011

Back home, a good deal lighter

Filed under: Mozilla — Axel Hecht @ 1:42 am

Next time you see me, I’ll look a tad different. That bump on my head got removed, surgery yesterday went fine, got home today. I’ll be taking it slow today, but should be up to speed tomorrow.

Thanks to all the good wishes.

Comments (11)

April 11, 2011

Being a localizer in the rapid release cycle

Filed under: L10n,Mozilla — Tags: L10n, Mozilla — Axel Hecht @ 6:15 pm

We’re changing to a 6-week release train model, and this is going to impact how localizers do their contributions. The following scheme has been cycled in .planning for a bit, so this is what we’ll be doing. We’ll adapt that if needed, of course, but based on experience with the next cycle or two.

Recap on the rapid release cycle: en-US developers work on mozilla-central, as they used to, and every 6 weeks, we’ll pull their contributions to another repository, called mozilla-aurora. That repository is string frozen. String changes only land in this repository as part of the merge from central to aurora. After another 6 weeks, the content goes to yet another repository, mozilla-beta. Corresponding to those, there’s l10n/mozilla-aurora and l10n/mozilla-beta. And now you know. Find a glossary at the end of this post.

There are two different localizer schemes: Early birds and friends of string freeze. Read the following descriptions and pick one for your individual localization team.

Early Birds are those localization teams that are happy to follow the mozilla-central content quickly and make sure that all issues relating to localizing that code are found and fixed. We already have a few of those that have built their reputation among our hackers to have good input to follow. We don’t need a lot of those, but the ones we have are crucial to make the plan work, and have code that is properly localizable at any time on aurora. You’ll be following the fx_central tree on the l10n dashboard to catch up on changes.

Friends of String Freeze are those teams that prefer to have stable content to localize with a decent time window to act on it. Many of our localization teams are in this group. If you’re in this group, you’ll set your calendar alarm to the next window, hg pull -u on your mozilla-aurora clone, your l10n/mozilla-aurora clone, localize, push, test, fix, push, sign-off. Then you set your calendar to the next 6-week cycle, and you’re all set. The expectation here is that the amount of strings will be rather low, so a day of l10n plus testing and fixing is fine. Usually, you should be able to deliver a great localization for the next version of Firefox in some 3 days. Firefox 5 right now is some 30 strings, other releases will be a good deal bigger. But nowhere close the 1.2k strings of Firefox 4. You’ll be watching the fx_aurora tree on the l10n dashboard to see the status of your localization.

Sign-offs will happen on aurora, in rare cases on beta. The setup where we work towards release is aurora.

What about the beta repositories? Well, I hope to not see a necessity to land on l10n/mozilla-beta for the most part. You should expect that changes you make on l10n/mozilla-beta will be dropped once we do the next update from aurora, so you want to have the fixes on both aurora and beta, if applicable. But really, you want to be good on aurora. Then beta will be fine and no hassle.

How that maps to mercurial work:

For the Friends of String Freeze, you’ll not need to worry about anything other than pulling on both repos every cycle. We’ll take your content from l10n/mozilla-aurora to l10n/mozilla-beta, and may very well at some point stop doing l10n-central builds at all for you. Just keep things simple here.

For the Early Birds, we’ll rely on you self-identifying and doing a tad of extra work. You’ll be in best shape to merge your contributions from l10n-central to l10n/mozilla-aurora, making sure that the result has all your fixes from both central and aurora, where you want them. You’re techy-geeky-savvy anyways, so that’s allright. If at some point, we learn that there’s a pattern that benefits from automation, we’ll check in on that when we get there, too. You shouldn’t have to worry about getting content on l10n/mozilla-beta anymore than the rest, though.

Glossary:
mozilla-central is the mercurial repository that en-US code is landed to as development makes progress.
l10n-central is the tree of mercurial repositories that the early-bird localizers use as development makes progress.
central is short for either, or both, of mozilla-central and l10n-central, depending on context.

The terms around mozilla-aurora, l10n/mozilla-aurora, and aurora map to their corresponding terms for central, same for mozilla-beta, l10n/mozilla-beta, and beta.

Update: Fixed the links to map to the new and stable repository locations.

Comments Off

November 24, 2010

compare-locales 0.9.1 is out

Filed under: L10n,Mozilla — Tags: L10n, Mozilla — Axel Hecht @ 7:30 am

I released compare-locales 0.9.1 yesterday on pypi. Do the regular

easy_install -U compare-locales

to update your local copy.

This update includes two bug-fixes compared to 0.9,

Don’t warn about XML-defined entities like &, bug 604404
Ensure that merged entities have a trailing newline, bug 612619

In particular the latter will make our l10n-merge code more stable. Sadly, we actually need to fix all the newly-reported errors in all stable branches and apps before we can update the production tag. Errors make compare-locales fail, and rightfully so. And fail is bad for release builds that don’t merge, also rightfully so.

Comments Off

November 15, 2010

As sure as logs are logs

Filed under: Mozilla — Tags: build, buildbot, Mozilla — Axel Hecht @ 12:04 pm

… or not.

As promised, I’ll write a bit about build logs today. You’ll see what our logs are, and, to begin with, I’ll take you on a tour through buildMessage to explain how the logs we have end up being what you see served off of tinderbox.

First off, buildbot is basically the same thing as any regular gecko app, one main thread and loads of callbacks. So when reading on, all your spontanous reactions are good.

The buildMessage code does:

synchronous IO to load all logs of a build into memory, basically up to some 70M
synchronous string handling to paste all that data together, with some extra padding
synchronous compression of the resulting string
synchronous base64 encoding of the compressed string

All on the main thread, all in one go, blocking. All of that to give you a single lengthy unformatted blob of text. Why?

Because our build logs are actually not a single lengthy unformatted blob of text, which is what tinderbox wants.

Let’s have a peek into what our build logs are, really. In my previous posts, I introduced you to the concept of build steps. They’re really the basic entity of work to be done for a build. Now, the logs are stored in buildbot pretty much in how the data comes, that is, each log is associated with a step, and the storage is happening as the chunks arrive. Commonly, that’d be stdout and stderr data coming from shell commands run on the slave. The information about which stream the data is on is persisted, too, as is the order, so any log looks like this, basically:

Step reference	header	length	data
	stdout	length	data
	stdout	length	data
	stdout	length	data
	stderr	length	data
	stdout	length	data

As most of you aren’t among the few priviledged ones to actually look at the real logs, I’ve set up a fake log page for you to take a look. It’s an l10n repack, mostly because they’re somewhat small in both step count and log size, and because I’m used to them. Here’s the actual make step highlighted. You can see the introduction being shown in blue, which is the common color for header chunks. Buildbot just uses that channel to show setup and shutdown information on the step. Then there’s the actual make output in black. If there was something on stderr, it’d be styled in red. Sorry, I didn’t quickly come up with something that has stderr.

The first take-away is that you can get to just the build output of the step you’re interested in.

If you’re nostalgic, you can check the checkbox for tinderbox, the css style sheet changes to show you what you’d get from tinderbox. Try to find the information again?

One further detail, there can be more than one log per step. Buildsteps that set build properties quite commonly have two logs, one that keeps track of the command that got run, and another that keeps track of the actually changed build properties. You can look at an example in the builddir step. The boring last line is the second log.

Log files are really not all that complicated, and much more useful than what we get back from tinderbox. Let’s look at some of the pros:

Log files come in as the build goes. This enables buildbot to publish build logs in almost realtime. There’s little-to-no cost for that, too, a simple node.js proxy can ensure that only one log is read at any time. Another benefit is, one can archive logs incrementally, removing the current stress on the masters to publish more data than they want to chew in one go.

Log files are per task. As the logs are associated with a step, which has a name and a builder, there’s pretty rich information available on what the data in question is actually about. Think about hg-specific error parsers for one step, ftp-specific ones for the next, and mochitest-specific ones for the one after that. All in one build. If we’d archive the raw data, we can easily improve our parsers and be compatible with old logs. Or add new steps to the build process without fear to break existing log parsers.

Tinderbox can still be fed. Even if we’re not sending out tinderbox log mails from the masters, we can still do the processing out of band in an external process or even external machine, offload the masters, and not enforce us to change all infrastructure in one go.

There is a hard piece, too, storage. Build logs are plenty, and they’re anywhere from a dozen bytes to 70M. Within the same build, even. There a hundreds of thousands small files, and thousands of really large ones. I hope that adding some information on what our build logs really are helps to spike a design discussion on this. If to compress, on which level. Retention, per step type, even? Store as single files, in one dir, or in a hierarchy, or as tar balls? Or all of the above as part of retention? Is hbase a fit?

Comments Off

November 12, 2010

Counting sourcestamps, changes, and faking data

Filed under: Mozilla — Tags: build, buildbot, Mozilla — Axel Hecht @ 8:31 am

As a follow up to my previous post on my digging through our build status, I want to look with a bit more detail, pretend it’d all be simple and what it could be, and, well, add the promised chocolate to coconut. Bounty.

Let’s look at the actual data for two and half landings. First, I’ll start with a rather simple landing by roc, revision 1b43… on mozilla-central. Let me summarize the builds real quick:

1 changeset in 1 push.

27 change rows in status db.

16 different branch names.

106 sourcestamps in status db.

245 builds.

That’s a lot, because, what we’re really interested in is

1 push, 245 builds.

Talk is cheap, but what’s really cheap is manipulating other peoples database, so while Vettel was running in circles in Brazil, I was running circles in the db and manually stitched things back together. The result is still coconut, but with chocolate, so here is the same url in bounty.

1 source, 1 change, 253 builds.

Wait, what, not 245? No, 253, because, well, there are more disconnects in the status, so the query in the database doesn’t catch them all. That’s what you need manual stitching for. Also, finding the right sourcestamp to keep isn’t trivial.

Which is why I only did it for a few builds. Sorry, you’ll not find a lot of builds that are stitched together, so enjoy the guided tour through the few shiny places.

During my needlework I came across another set of changes which are worthwhile to include into today’s tour. It’s two pushes, by khuey and vlad. Let’s give Count Count a rest and look at them with chocolate right away, the builds for khuey’s revision and vlad’s. What you’ll notice is that some of the builds for khuey aren’t there, but lumped together with vlad’s. Why’s that?

Our build infrastructure really doesn’t know about pushes. As I’ve detailed in my previous post, there are sourcestamps and changes, but no further grouping. At this point it’s really a design decision on whether the buildbot changes are hg pushes or hg changesets. This decision is currently in favor of hg changesets, which may just be right. At that point, the scheduling logic that puts changes into builds needs to put extra care into creating sourcestamps for what we intend to get builds for, and to keep those sourcestamps apart. The current implementation puts anything coming in within three minutes into the same sourcestamp, which is somewhat of a load limiter.

Anyway, back to chocolate. When you looked at the pages, did you realize that once you add it, you’re almost at a tinderboxpushlog page? Right, it could be that “easy”.

What’s between reality and chocolate? Well, sendchange. That’s a buildbot architecture component that allows shell commands to insert changes into buildmaster. It’s rather limited in the data it can transport, which is why we loose data on the way. There’s an alternative feature called trigger, which doesn’t. There is an open ticket to make that span different buildmasters, but given how much Mozilla invested into schedulerdb, let’s pray it’s easy to fix. Filed bug 611670.

Update: Changed links to l10n community server.

Comments Off

November 10, 2010

Looking at the internals of our builds

Filed under: Mozilla — Tags: build, buildbot, Mozilla — Axel Hecht @ 9:34 am

Chris Atlee has put up database dumps of both the scheduler and the status databases. These databases are the most detailed and (almost, status db is not) first class information on what our builds are really doing. The current code on top of that is all pylons-based, and I am, as many other mozillians, part of the django shop. So for one I figured “let’s see if I can make django read this”.

I can. It’s a bit rough, though, as some of the tables for many-to-many relationships don’t contain a primary key column. Django really doesn’t like that, and thus there are some things that you cannot do without those columns. Most of it works just fine, though. This holds at least for the status db, which I looked at in more detail, but the scheduler db ain’t much different. Filed bug 611014.

The code for this is up at django_moz_status on github.

Of course, me being able to talk to the db with a python shell won’t help you much, right? So I’ve spent a few more hours to actually create a really rough website on top of it, which I want to share with you.

Coconut. Hard to open, and once you get there, you may not like it. I have a thing with project names.

Coconut is a bunch of django views on top of django_moz_status, held together by site_demo. You can see it in action (for now) on the l10n community server. This view is exposing three concepts of buildbot, and how they play together:

Sourcestamps (first column): Every build in buildbot has a sourcestamp, and a sourcestamp can have multiple builds. The sourcestamp knows a branch, a revision, and a list of changes going into the builds associated with it.

Changes (second column): Changes are the external “real life” events that may or may not trigger builds. In this view, you see a few list of changes that look like a push to hg (and are just that), as well as a plethora of changes by Mr. sendchange, and Mr. sendchange-unittest. If you remove some query params on the above URL, you can also see a bunch of sourcestamps without changes. Those are nightlies.

Builds (last column): Each sourcestamp can have multiple builds, I’m just showing the builder name (a symbolic short name), the buildnumber, and the result as color. The third column is actually a guess on the platform of the build, based on the platform build property. If that’s not set, unknown is used.

Which brings me nicely to another two pieces of our build infrastructure that has been hard to look at so far. Build steps and build properties. Surf along? Let’s look at a build.

The first section of the build pages shows some general information, builder, buildnumber, status. The start and end time, how long the build took. Also it lists the buildbot master, and the category of the builder. Categories allow to filter for builds, sadly, a builder can only be in one category.

Next up is build steps. Each step in a build is an item of work, and an entity of failure. Different steps can handle failures differently, too. You can see that the build starts with a flock of steps that do administrative tasks on the slave. You can see which fragment of time of the build that step took by looking at the bar in the second column. You’ll see that the majority of that build went into the compile step. And that that passed. And after that compile, there’s a bunch of adminstrative stuff again.

There are two things that you do not see. One is, each of those steps has build logs attached to them. Commonly one, but possibly more. I’ll talk more about logs in a different post. The other thing is, steps can change the build properties. Which is to say, the build properties which are shown at the end of the build page are not static, but change during the build run.

Build properties? Right, within buildbot, several objects can have properties, among them, builds. You’ll find things like the buildnumber, the slavename, the branch, buildername (pretty). You’ll also find a host of things around the packaging of the build. Quite generally, our build try to put things that are needed for the build itself or for logic around into build properties.

The end of the page is reiterating which changes are associated with the sourcestamp for this build.

Let me stress your patience once more and invite you to visit a build with a failed step. In this build page you can see how the clobber step worked fine, and took quite some time, and the actual status of the build is due to the actual test step failing with a warning.

Now this post is already pretty lengthy, so I’ll take a break here and invite you to go in and click back and forth a bit, and to do some url hacking. If you think this is rough and you’re having a hard time figuring out what’s why, I’ll do a follow up post on how to add chocolate to coconut.

PS: the database this instance is working on is a snapshot that ends in August, details may be different today. I shrunk the database, too, only the last 10k builds still have the buildsteps.

Update: Changed the links to the l10n community server.

Comments (2)

November 5, 2010

MultilingualWeb: Workshop in Madrid

Filed under: L10n,Mozilla — Tags: L10n, mlw, Mozilla — Axel Hecht @ 8:24 am

So I’ve been at the W3 MultilingualWeb Workshop in Madrid last week, and I guess there are a few things worth reporting.

MultilingualWeb is a project bound to host 4 workshops to bring people from different fields together to see how standards and best practices (existing and not) can help the web. Being mozilla, we don’t really need to add that it’s beyond just one language, right? The effort is strongly supported by the European Union, so there’s a bias towards participants in these workshops being from Europe, though the folks by themselves certainly talk beyond that.

The crowd in Madrid was really diverse, standards people, government (EU and India), researchers, content, and, well, browsers. The browsers people were Charles McCathieNevile (Opera), Jan Nelson and Peter Constable (Microsoft), and me (Mozilla). There we no folks from webkit-based browsers.

Interesting bits and pieces:

I guess other people made that experience lately, too, but I welcome the way that MSFT is positioning themselves lately. Now they just need to compare beta builds to beta builds, and, (insider joke) while we hack on canvas, you learn JS:

- ctx = canvas1.getContext("2d"); + ctx = document.getElementById('canvas1').getContext("2d");

Still need to actually look at the results in competing browsers, and not on my font-broken OSX, but we’re not doing too bad.

Gecko should really start using CLDR data for stuff like plurals, dates, calendars, lists. I should also really read up on ES’ i18n_api.

It was interesting to see common questions on what’s a language from Denis Gikunda, who’s working on l10n for google in sub-saharan Africa. Now that Anloc is coming in with their localizations, we’re getting more exposed to how the history of those languages is so different from European ones.

Facebook’s Ghassan Haddad reported on a few interesting things. Like Zuckerman coming into his interview with “you can’t slow our development down”. Interesting about this is that the resulting infrastructure is far from zero-impact on the development. There are quite some restrictions on what content you can put up, and you have to add syntactic sugar all over, too. Go check their docs for details. Also, they’re not slowing down the publishing of localizations.

We got a bit of detail in the discussion about vandalism in fb l10n. They initially relied on community there, but when they got hit, they took down the localized sites until they had tooling support. Ghassan didn’t come forward with details on what they do, though.

They are doing something conceptually similar to l20n to localize their social messages like “A is now friend with B, C”, to make those depend on all the genders. IIRC, they call it string or entity explosion. Didn’t get to ask any questions about this one, sadly.

Most of the science people talked about processes that all sound very good for the data we get from feedback in Firefox 4 betas. Natural language processing with trends detection, “translation” of SMS Spanish into Spanish, and much more. Sadly, there’s nothing shrink wrapped that we could just use, but there’s interest in creating a project to find out, maybe for Firefox Next?

One thing that felt slightly odd was the Semantic Web. I thought that was dead, but there’s still optimism around that. Maybe semantics that help machine translation make a case for it, I’m not sure. Also, there seems to be more structured data coming to the “public web”, and the algorithms that transform the “hidden web” into the “public web” could more easily add markup than human authors would. Still, there wasn’t much hope in the browser people. Luckily, the browser doesn’t really need to do anything but creating a DOM, and passing markup around for machine translation engines taking benefit from additional semantics.

Last but not least, I did finally get to spend some quality time with our Madrid community, thanks to the folks for taking me out twice. I had a great time, and sorry that my English speaking tempo aligned with your Spanish speaking tempo way too often :-).

Comments Off

October 4, 2010

Releasing compare-locales 0.9, aka, the value checker

Filed under: L10n,Mozilla — Tags: L10n, Mozilla — Axel Hecht @ 9:06 am

I’ve just uploaded version 0.9 of compare-locales onto pypi. It’s finally the version that does all the fancy value checks that I’ve been talking about for a while, and that some of the localizers have seen flying by in their bugmail.

Here’s what it does:

For DTDs, I create fake xml docs, and try to parse them. This should find encoding errors, as well as unbalanced XML tags or stray ‘&’ ampersands. There’s one thing that’s tricky, and that is references to entities. I do get the list of entities from en-US, so I do have a good idea which should work (really, please). On the other hand, referencing other entities may not be an error. &rdquot; for example could be totally fine. If referenced in an XHTML document, that is. Not if it was included in a XUL document. Of course both breeds could include the same DTD file. I can’t really tell, so I’ve added a new category of reporting, called warnings.

For properties files, I check a bunch of printf tricks. Some of those are warnings, some of those are errors. Which is which basically depended on code-inspection. I also did some heuristics based on comments referencing the plural docs to check for our plurals-special variable handling.

Outstanding are the installer variable checks still, didn’t want to hold back this release for that. They’re somewhat tricky in the details and yet more tedious to get right than the other checks.

What does that mean for localizers? You wanna get the error count down to zero. The warnings count may or may not go down to zero, that’s your call.

The new version isn’t in public use anywhere yet, the deployment will go like this:

Get a round of public feedback on this release.
Use on the dashboard (likely gonna happen when I do the 2.0 branch dance, too).
Try to get the new version used on the build system.

Please give the new version a bit of pounding in your local l10n-merge builds, too. It should strip entities with errors from your localization, and merge in en-US strings for that.

Feel free to file bugs on issues you find.

Comments (1)

« Newer Posts — Older Posts »

Axel Hecht Mozilla in Your Language

May 30, 2011

May 10, 2011

April 20, 2011

April 11, 2011

November 24, 2010

November 15, 2010

November 12, 2010

November 10, 2010

November 5, 2010

October 4, 2010