Can’t you graph that graph

I’m going to just recreate blame, he said. It’s going to be easy, he said.

We have a project to migrate the localization of Firefox to one repository for all channels, nick-named cross-channel, or x-channel in short. The plan is to create one repository that holds all the en-US strings we need for Firefox and friends on all channels. One repository to rule them all, if you wish. So you need to get the contents of mozilla-central, comm-central, *-aurora, *-beta, *-release, and also some of *-esr?? together in one repository, with, say, one toolkit/chrome/global/customizeToolbar.dtd file that has all the strings that are used by any of the apps on any branch.

We do have some experience with merging the content of localization files as part of l10n-merge which is run at Firefox build time. So this shouldn’t be too hard, right?

Enter version control, and the fact that quite a few of our localizers are actually following the development of Firefox upstream, patch by patch. That they’re trying to find the original bug if there’s an issue or a question. So, it’d be nice to have the history and blame in the resulting repository reflect what’s going on in mozilla-central and its dozen siblings.

Can’t we just hg convert and be done with it? Sadly, that only converts one DAG into another hg DAG, and we have a dozen. We have a dozen heads, and we want a single head in the resulting repository.

Thus, I’m working on creating that repository. One side of the task is to update that target repository as we see updates to our 12 original heads. I’m pretty close to that one.

The other task is to create a good starting point. Or, good enough. Maybe if we could just create a repo that had the same blame as we have right now? Like, not the hex or integer revisions, but annotate to the right commit message etc? That’s easy, right? Well, I thought it was, and now I’m learning.

To understand the challenges here, one needs to understand the data we’re throwing at any algorithm we write, and the mercurial code that creates the actual repository.

As of FIREFOX_AURORA_45_BASE, just the blame for the localized files for Firefox and Firefox for Android includes 2597 hg revisions. And that’s not even getting CVS history, but just what’s in our usual hg repository. Also, not including comm-central in that number. If that history was linear, things would probably be pretty easy. At least, I blame the problems I see in blame on things not being linear.

So, how non-linear is that history. The first attempt is to look at the revision set with hg log -G -r .... . That creates a graph where the maximum number of parents of a single changeset is at 1465. Yikes. We can’t replay that history in the target repository, as hg commits can only have 2 parents. Also, that’s clearly not real, we’ve never had that many parallel threads of development. Looking at the underlying mercurial code, it’s showing all reachable roots as parents of a changeset, if you have a sparse graph. That is, it gives you all possible connections through the underlying full graph to the nodes in your selection. But that’s not what we’re interested in. We’re interested in the graph of just our nodes, going just through our nodes.

In a first step, I wrote code that removes all grandchildren from our parents. That reduces the maximum number of parents to 26. Much better, but still bad. At least it’s at a size where I can start to use graphviz to create actual visuals to inspect and analyze. Yes, I can graph that graph.

The resulting graph has a few features that are actually already real. mozilla-central has multiple roots. One is the initial hg import of the Firefox code. Another is including Firefox for Android in mozilla-central, which used to be an independent repository. Yet another is the merge of services/sync. And then I have two heads, which isn’t much of a problem, it’s just that their merge commit didn’t create anything to blame for, and thus doesn’t show up in my graph. Easy to get to, too.

Looking at a subset of the current graph, it’s clear that there are more arcs to remove:

Anytime you have an arc that just leap-frogs to an ancestor, you can safely remove that. I indicated some in the graph above, and you’ll find more – I was just tired of annotating in Preview. As said before, I already did that for grandchildren. Writing this post I realize that it’s probably easy enough to do it for grandgrandchildren, too. But it’s also clear from the full graph, that that algorithm probably won’t scale up. Seems I need to find a good spot at which to write an explicit loop detection.

This endeavour sounds a bit academic at first, why would you care? There are two reasons:

Blame in mercurial depends on the diff that’s stored in the backend, and the diff depends on the previous content. So replaying the blame in some way out of band doesn’t actually create the same blame. My current algorithm is to just add the final lines one by one to the files, and commit. Whitespace and reoccurring lines get all confused by that algorithm, sadly.

Also, this isn’t a one-time effort. The set of files we need to expose in the target depends on the configuration, and often we fix the configuration of Firefox l10n way after the initial landing of the files to localize. So having a sound code-base to catch up on missed history is an important step to make the update algorithm robust. Which is really important to get it run in automation.

PS: The tune for this post is “That Smell” by Lynyrd Skynyrd.

On updating the automation behind l10n.m.o

Or, how to change everything and nobody sees a difference.

Heads up: All I’m writing about here is running on non-web-facing VMs behind VPN.

tl;dr: I changed 5 VMs, landed 76 changesets in 7 repositories, resolving 12 bugs, got two issues in docker fixed, and took a couple of days of downtime. If automation is your cup of tea, I have some open questions at the end, too.

To set the stage: Behind the scenes of the elmo website, there’s a system that generates the data that it shows. That system consists of two additional VMs, which help with the automation.

One is nick-named a10n, and is responsible for polling all those mercurial repositories that we use for l10n, and to update the elmo database with information about these repositories as it comes in. elmo basically keeps a copy of the mercurial metadata for quicker access.

The other is running buildbot to do the actual data collection jobs about the l10n status in our source repositories. This machine runs both a master and one slave (the actual workhorse, not my naming).

This latter machine is an old VM, on old OS, old Python (2.6), never had real IT support, and is all around historic. And needed to go.

With the help of IT, I had a new VM, with a new shiny python 2.7.x, and a new storage. Something that can actually run current versions of compare-locales, too. So I had to create an update for

Python 2.6 Python 2.7.x
globally installed python modules virtualenv
Django 1.4.18 Django 1.8.x
Ubuntu CentOS
Mercurial 3.7.3 Mercurial 4.0.1 and hglib
individual local clones unified local clones
No working stage docker-compose up

At the same time, we also changed hg.m.o from http to https all over the place, which also required a handful of code changes.

One thing that I did not change is buildbot. I’m using a heavily customized version of buildbot 0.7.12, which is incompatible with later buildbot changes. So I’m tied to my branch of 0.7.12 for now, and with that to Twisted 8.2.0. That will change, but in a different blog post.

Unified Repositories

One thing I wanted and needed for a long time was to use unified clones of our mercurial repositories. Aside from the obvious win in terms of disk usage, it allows to use mercurial directly to create a diff from a revision that’s only on aurora against a revision that’s only on beta. Sadly, I did think otherwise when I wrote the first parts of elmo and the automation behind it, often falling back to default instead of an actual hash revision, if I didn’t know anything ad-hoc. So that had to go, and required a surprising amount of changes. I also changed the way that comparisons are triggered, making them fully reproducible. They also got more robust. I used to run hg id -ir . to get the revision, which worked OK, unless you had extension errors in stdout/stderr. Meh. Good that that’s gone.

As I noted, the unified repositories also benefit doing diffs, which is one of the features of elmo for reviewing localizations. Now that we can just use plain mercurial to get those diffs, I could remove a bunch of code that created diffs between aurora and beta by creating diffs between each head and some ancestor, and then sticking those diffs back together. Good that that’s gone.

Testing

Testing an automation with that many moving parts is hard. Some things can be tested via unit tests, but more often, you just need integration tests. I still have to find a way to write automated integration tests, but even manual integration tests require a ton of set-up:

  • elmo
  • MySQL
  • ElasticSearch
  • RabbitMQ
  • Mercurial upstream repositories
  • Mercurial web server
  • a10n get-pushes poller
  • a10n data ingestion worker
  • Buildbot master
  • Buildbot slave

Doing this manually is evil, and on Macs, it’s not even possible, because Twisted 8.2.0 doesn’t build anymore. I used to have a script that did many of these things, but that’s …. you guessed it. Good that that’s gone. Now I have a docker-compose test setup, that has most things running with just a docker-compose up. I’m still running elmo and MySQL on my host machine, fixing that is for another day. Also, I haven’t found a good way to do initial project setup like database creations. Anyway, after finding a couple of bugs in docker, this system now fires up quickly and let’s me do various changes and see how they pass through the system. One particularly nice artifact is that the output of docker-compose is actually all the logs together in one stream. So as you’re pushing things through the system, you just have one log to watch.

As part of this work, I also greatly simplified the code structure, and moved the buildbot integration from three repositories into one. Good that those are gone.

snafus

Sadly there were a few bits and pieces where my local testing didn’t help:

Changing the URL schemes for hg.m.o to https alongside this change triggered a couple of problems where Twisted 8.2 and modern Python/OpenSSL can’t get a connection up. Had to replace the requests to websites with synchronous urllib2.urlopen calls.

Installing mercurial in a virtualenv to be used via hglib is good, but WSGI doesn’t activate the virtualenv, and thus PATH isn’t set. My fix still needs some server-side changes to work.

I didn’t have enough local testing for the things that Thunderbird needs. That left that setup burning for longer than I anticipated. The fix wasn’t hard, just badly timed.

Every now and then, Django 1.8.x and MySQL decide that it’s a good idea to throw away the connection, and die badly. In the case of long-running automation jobs, that’s really hard to prevent, in particular because I still haven’t fully understood what change actually made that happen, and what the right fix is. I just plaster connection.close() into every other function, and see if it stops dying.

On Saturday morning I woke up, and the automation didn’t process Firefox for a locale on aurora. I freaked out, and added tons of logging. Good logging that is. Best logging. Found a different bug. Also found out that the locale was Belarus, and that wasn’t part of the build on Saturday. Hit my head against a wall or two.

Said logging made uncaught exceptions in some parts of the code actually show up in logs, and discovered that I hadn’t tested my work against bad configurations. And we have that, Thunderbird just builds everything on central, regardless of whether the repositories it should use for that exist or not. I’m not really happy yet with the way I fixed this.

Open Questions

  • Anyone got taskcluster running on something resembling docker-compose for local testing and development? You know, to get off of buildbot.
  • Initial setup steps for the docker-compose staging environment are best done … ?
  • Test https connections in docker-compose? Can I? Which error cases would that cover?

On the process to code

gps blogs a bunch about his vision on how coding on Firefox should evolve. Sadly there’s a bunch of process debt in commenting, so I’ll put my comments in a medium where I already have an account.

A lot of his thinking is based on Code First. There’s an earlier post with that title, but with less content on it, IMHO.

In summary, code first capitalizes on the fact that any change to software involves code and therefore puts code front and center in the change process.

I strongly disagree.

Code is really just one artifact of many of creating and maintaining a software product.

Or, as Shane blogged,

First rate hackers need process too

Bugzilla is much more than a review tool, it’s a store of good and bad ideas. It’s where we document which of our ideas are bad, or just not good enough. It’s where we build out our shared understanding of where we are, and where we want to be.

I understand how gps might come to his conclusions. Mostly because I find myself in a bucket that might be like his. Where your process is really just documenting that you agree with yourself, and that you’re going forward with it. And then that you did. Yeah, that’s tedious and boring. Which is why I took my hat as module owner more often lately, and landed changes without a bug, and with rs=foopy. Old guard.

Without applying my basic disagreement to several paragraphs over several posts, some random notes on a few of gps’ thinking and posts:

I agree that we need to do something about reviews. splinter’s better than plain text, but not quite there. Github PRs feel pretty horrible at least in the way we use them for gaia. As soon as things get slightly interesting, no status in either the PR nor bugzilla makes any sense anymore. I’m curious on MozReview/reviewboard, though the process to get things hooked up there is steep. I’ll give it a few more tries for the few things I do on Firefox, and improvements on code and docs. The one time I used it, I felt very new.

Much of the list gps has on process debt are things he cares about, and very few other people. In particular newcomers won’t even know to bother, unless they happen to across that post by accident. The other questions are not process debt, but good things to learn, and to learn early.

Update to the l10n team page

I’ve given the team pages on l10n.mozilla.org a good whack in the past few days. Time to share some details and get feedback before I roll it out.

The gist of it: More data in less screen space, I just folded things into rows, and made the rows slimmer. Better display of sign-off status, I separated status from progress and actions. Actions are now ordered chronologically, too.

The details? Well, see my recording where I walk you through:


View it on youtube.

Comments here or in bug 1077988.

Create your own dashboard

We have a lot of data around localizations, but it’s hard to know what people might be looking for.

I just switched a new feature live, edit your own dashboard.

You can select branches of products, as well as the localizations you’re interested in, and get data you want.

Say you’re looking for mobile and India. You’d want Firefox OS and Firefox for Android aka Fennec. The latter is actively localized on aurora, so you’d want the gaia tree and fennec_aurora. You want Assamese, Bengali, Gujarati…. and 9 other languages. Select gu and pa, too, ’cause why not.

Or are you keen on Destop in Latin America? Again we’re looking at Aurora, so fx_aurora is our tree of choice this time. Locales are Spanish in its American Variants, and Brazilian Portuguese.

Select generously, you can always reduce your selection through the controls on the right side of the resulting dashboard.

Play around, and compare the Status and History columns. Try to find stories, and share them in the comments below.

A bit more details on fx-aurora vs Firefox 32. Right now, Firefox 32 is on the Aurora channel and repository branch. Thus, selecting either gives you the same data today. In six weeks, though, 32 is going to be on beta, so if you bookmark a link, it’d give you different data then. That’s why you can only select one or the other.

Progress on l10n.mozilla.org

Today we’re launching an update to l10n.mozilla.org (elmo).

Team pages and the project overview tables now contain sparklines, indicating the progress over the past 50 days.

Want to see how a localization team is doing? Now with 100% more self-serve.

If the sparklines go up like so

good progress

the localization is making good progress. Each spark is an update (either en-US or the locale), so sparks going up frequently show that the team is actively working on this one.

If the sparklines are more like

not so much

then, well, not so much.

The sparklines always link to an interactive page, where you can get more details, and look at smaller or larger time windows for that project and that locale.

You should also look at the bugzilla section. A couple of bugs with recent activity is good. More bugs with no activity for a long time, not so much.

Known issues: We still have localizations showing status for central/nightly, even though those teams don’t work on central. Some teams do, but not all. Also, the sparklines start at some point in the past 50 days, that’s because we don’t figure out the status before. We could.

A tale of convert and convert

Or, how I made converting gaia to gaia-l10n suck less.

Background: For Firefox OS, we’re exposing a modified repository to localizers, so that it’s easier to find out what to work on, and to get support from the l10n dashboards. Files in the main gaia repository on github like

apps/browser/locales/browser.en-US.properties

should become

apps/browser/browser.properties

and the localizable sections in manifest.webapp,

{
…
  "locales": {
    "en-US": {
      "name": "Browser",
      "description": "Gaia Web Browser"
    }
…
}

are exposed in manifest.properties as

name: Browser
description: Gaia Web Browser

We’re also not supporting git on the l10n dashboard yet, so we need hg repositories.

I haven’t come across a competitor to hg convert on the git side yet, so I looked on the mercurial side of life. I started by glancing of the code in hgext/convert in the upstream mercurial code. That does a host of things to get parents and graphs right, and I didn’t feel like replicating that. It doesn’t offer hooks for dynamic file maps, though, let alone content rewriting. But it’s python, and it’s open-source. So I forked it.

With hg convert. Isn’t that meta? That gives me a good path to update the extension with future updates to upstream mercurial. I started out with a conversion of mercurial 2.7.1, then removed all the stuff I don’t need like bzr support etc. Then I made the mercurial code do what I need for gaia. I had to disable some checks that try to avoid commits that don’t actually change the contents, because I don’t mind that happening. And last but not least I added the filemap and the shamap of the initial conversion of hgext/convert, so that future updates don’t depend on my local disk.

Now I could just run hg gaiaconv and get what I want. Enter the legacy repositories for en-US. We only want fast-forward merges in hg, and in the conversion to git. No history editing allowed. But as you can probably guess, the new history is completely incompatible with the old, from changeset one. But I don’t mind, I hacked that.

I did run the regular hg gaiaconv up to the revision 21000 of the integration/gaia-central repository. That ended up with the graph for revision 4af36780fb5d.

gaia-central conversion

I pulled the old conversion for v1-train, which is the graph for revision ad14a618e815.

old en-US for v1-train

Then I did a no-op merge of the old graph into the new one.

merge

That’s all good, but now future conversions via gaiaconv would still pick up the non-merged revision. Well, unless one just edits the generated shamap, and replaces all references to 4af36780fb5d with cfb28f851111. And yes, that actually works.

more conversions post-merge

Tadaaa, a fully automated conversion process, and only forward merges.

Repositories involved in this post:

git-merge 2013

I spent Friday and Saturday on git-merge, an unconference on git. Thursday was developer day, for core contributors to git itself, and libgit2/jgit. I didn’t go there. Friday was “user” day, and Saturday was hackday. I figured it might be useful to go to the userday, and turned out, it was. It wasn’t all that much about users at all. It was strictly unconference, so people would take the stage and give a quick lightening talk, and later in the day we had a few breakout sessions. The most user-like people were folks migrating to git just now, and they had a good deal of folks to talk to in the breakout session.

The rest of Friday was actually library and tool hackers talking about what they’re doing. There are notes on github.com/git-merge/user-day, but I do want to hightlight a few.

imerge is probably the most interesting to gecko hackers. It allows you to merge or rebase intensive histories with conflicts incrementally, with tons of automation. It even allows to do test runs on the merges it does automatically. If you ever resolved the same conflict 10 times in one rebase, this is for you.

libgit2 was under the hood of many, with core contributors of the library itself being there, plus the maintainers of at least the C# and the python bindings. There was also a good deal of tooling based on jgit, a pure-java implementation of git.

Much debate on java vs not, actually. And Cmd-F1. Little conflict between git and hg.

I also got to enjoy the github backend talk by vmg, with Ernie, Bert, and smoke through chimneys. I had seen a recording of it before, but it was well worth it seeing it live.

I joined the breakout session about teaching git, and had the pleasure to be in a group that tought git in various parts of the globe. Yes, that might be relevant to me, so that was good exercise. I talked with Scott Schacon about localized version of the git book, and localized github docs, too.

Given that I had the chance to talk and co-hack with all those tooling guys, I decided to drop by on Saturday for the hack day. I wanted to take the opportunity to talk about my weird repository rewrite questions, and so I did.

Saturday was great. I only got 20 lines of code written, but I finally understood what git is about in the back-end. There’s loads of hackery that you can do, and funny enough, both I and Ben ended up hacking on a repository rewrite algorithm, which helped me a lot, both talking about the structure of the code, as well as vetting the approach. His code in C# is actually in a PR. Worth a look for people that want to see how to hack tooling on top of that binding. I tried the same in python, and succeeded to some extent. David Ibáñez helped me a great deal. But it also showed me the difference between the C# API and the python bindings. If only mono was easy to get up on the mac.

On the conference itself, it was set up at the Radisson Blu next to the Berliner Dom, which is a nice venue for that size. Wifi worked, food and beer were there. It’s sad that many people claimed tickets and didn’t show up. Now you know what you missed, and what you made other people miss. Bad bunny, no chocolates! Thanks again for Jen for setting things up great.

Thanks to all the folks at git-merge for making it a great event. See you in Berlin…

Let me shake your hand

Timely for the 15 years of mozilla, let me share this with you.

I was wearing my Firefox hoodie in a store in San Francisco last week. I was trying to find out what to buy, and a young guy came by. I did let him pass, but he didn’t find what he wanted, and left the store. Just a moment later he returned and said

I’ve been using Firefox for 10 years, and I’d like to shake your hand.

First of all, feel your hand shaken. It’s obviously us and not me.

But more importantly, this happened in the US. For the longest time, folks from the US would tell those stories coming back from Europe. We’re finally there, across the globe.

And when it comes down to the mission, we’re obviously winning. I love to tell people that I work for Mozilla. And mostly everybody knows that we exist, and more importantly, knows that they have a choice between various browsers. And they’re making a choice.

So here’s to 15 more years to choice and innovation on the internet.

Risk management for releases at scale

Let me share some recent revelations I had. It all started with the infamous Berlin airport. Not the nice one in Tegel, but the BBI desaster. The one we’ve thought we’d open last year, and now we don’t know which year.

Part of the newscoverage here in Germany was all about how they didn’t do any risk analysis, and are doomed, and how that other project for the Olympics in London did do risk analysis, and got in under budget, ahead of time.

So what’s good for the Olympics can’t be bad for Firefox, and I started figuring out the math behind our risk to ship Firefox, at a given time, with loads of localizations. How likely is it that we’ll make it?

Interestingly enough, the same algorithm can also be applied to a set of features that are scheduled for a particular Firefox release. Locales, features, blockers, product-managers, developers, all the same thing :-). Any bucket of N things trying to make a single deadline have similar risks. And the same cure. So bear with me. I’ll sprinkle graphs as we go to illustrate. They’ll link to a site that I’ve set up to play with the numbers, reproducing the shown graphs.

The setup is like this: Every single item (localization, for exampe) has a risk, and I’m assuming the same risk across the board. I’m trying to do that N times, and I’m interested in how likely I’ll get all of them. And then I evaluate the impact of different amounts of freeze cycles. If you’re like me, and don’t believe any statistics unless they’re done by throwing dices, check out the dices demo.

Anyway, let’s start with 20% risk per locale, no freeze, and up to 100 locales.

80%, no freeze, up to 100

Ouch. We’re crossing 50-50 at 3 items already, and anything at scale is a pretty flat zero-chance. Why’s that? What we’re seeing is an exponential decay, the base being 80%, and the power being how often we do that. This is revelation one I had this week.

How can we help this? If only our teams would fail less often? Feel free to play with the numbers, like setting the successrate from 80% to 90%. Better, but the system at large still doesn’t scale. To fight an exponential risk, we need a cure that’s exponential.

Turns out freezes are just that. And that’d be revelation two I had this week. Let’s add some 5 additional frozen development cycles.

80%, 5 freezes, up to 100

Oh hai. At small scales, even just one frozen cycle kills risks. Three features without freeze have a 50-50 chance, but with just one freeze cycle we’re already at 88%, which is better than the risk of each individual feature. At large scales like we’re having in l10n, 2 freezes control the risk to mostly linear, 3 freezes being pretty solid. If I’m less confident and go down to 70% per locale, 4 or 5 cycles create a winning strategy. In other words, for a base risk of 20-30%, 4-5 freeze cycles make the problem for a localized release scale.

It’s actually intuitive that freezes are (kinda) exponentially good. The math is a tad more complicated, but simplified, if your per-item success rate is 70%, you only have to solve your problem for 30% of your items in the next cycle, and for 9% in the second cycle. Thus, you’re fighting scale with scale. You can see this in action on the dices demo, which plays through this each time you “throw” the dices.

Now onwards to my third revelation while looking at this data. Features and blockers are just like localizations. Going in to the rapid release cycle with Firefox 5 etc, we’ve made two rules:

  • Feature-freeze and string-freeze are on migration day from central to aurora
  • Features not making the freeze take the next train

That worked fine for a while, but since then, mozilla has grown as an organization. We’ve also built out dependencies inside our organization that make us want particular features in particular releases. That’s actually a good situation to be in. It’s good that people care, and it’s good that we’re working on things that have organizational context.

But this changed the risks in our release cycle. We started off having a single risk of exponential scale after the migration date (l10n). Today, we have features going in to the cycle, and localizations thereof. At this point, having feature-freeze and string-freeze being the same thing becomes a risk for the release cycle at large. We should think about how to separate the two to mitigate the risk for each effectively, and ship awesome and localized software.

I learned quite a bit looking at our risks, I hope I could share some of that.