Mozilla « Axel Hecht

February 15, 2013

Risk management for releases at scale

Filed under: L10n,Mozilla — Tags: L10n, Mozilla — Axel Hecht @ 8:26 am

Let me share some recent revelations I had. It all started with the infamous Berlin airport. Not the nice one in Tegel, but the BBI desaster. The one we’ve thought we’d open last year, and now we don’t know which year.

Part of the newscoverage here in Germany was all about how they didn’t do any risk analysis, and are doomed, and how that other project for the Olympics in London did do risk analysis, and got in under budget, ahead of time.

So what’s good for the Olympics can’t be bad for Firefox, and I started figuring out the math behind our risk to ship Firefox, at a given time, with loads of localizations. How likely is it that we’ll make it?

Interestingly enough, the same algorithm can also be applied to a set of features that are scheduled for a particular Firefox release. Locales, features, blockers, product-managers, developers, all the same thing :-). Any bucket of N things trying to make a single deadline have similar risks. And the same cure. So bear with me. I’ll sprinkle graphs as we go to illustrate. They’ll link to a site that I’ve set up to play with the numbers, reproducing the shown graphs.

The setup is like this: Every single item (localization, for exampe) has a risk, and I’m assuming the same risk across the board. I’m trying to do that N times, and I’m interested in how likely I’ll get all of them. And then I evaluate the impact of different amounts of freeze cycles. If you’re like me, and don’t believe any statistics unless they’re done by throwing dices, check out the dices demo.

Anyway, let’s start with 20% risk per locale, no freeze, and up to 100 locales.

Ouch. We’re crossing 50-50 at 3 items already, and anything at scale is a pretty flat zero-chance. Why’s that? What we’re seeing is an exponential decay, the base being 80%, and the power being how often we do that. This is revelation one I had this week.

How can we help this? If only our teams would fail less often? Feel free to play with the numbers, like setting the successrate from 80% to 90%. Better, but the system at large still doesn’t scale. To fight an exponential risk, we need a cure that’s exponential.

Turns out freezes are just that. And that’d be revelation two I had this week. Let’s add some 5 additional frozen development cycles.

Oh hai. At small scales, even just one frozen cycle kills risks. Three features without freeze have a 50-50 chance, but with just one freeze cycle we’re already at 88%, which is better than the risk of each individual feature. At large scales like we’re having in l10n, 2 freezes control the risk to mostly linear, 3 freezes being pretty solid. If I’m less confident and go down to 70% per locale, 4 or 5 cycles create a winning strategy. In other words, for a base risk of 20-30%, 4-5 freeze cycles make the problem for a localized release scale.

It’s actually intuitive that freezes are (kinda) exponentially good. The math is a tad more complicated, but simplified, if your per-item success rate is 70%, you only have to solve your problem for 30% of your items in the next cycle, and for 9% in the second cycle. Thus, you’re fighting scale with scale. You can see this in action on the dices demo, which plays through this each time you “throw” the dices.

Now onwards to my third revelation while looking at this data. Features and blockers are just like localizations. Going in to the rapid release cycle with Firefox 5 etc, we’ve made two rules:

Feature-freeze and string-freeze are on migration day from central to aurora
Features not making the freeze take the next train

That worked fine for a while, but since then, mozilla has grown as an organization. We’ve also built out dependencies inside our organization that make us want particular features in particular releases. That’s actually a good situation to be in. It’s good that people care, and it’s good that we’re working on things that have organizational context.

But this changed the risks in our release cycle. We started off having a single risk of exponential scale after the migration date (l10n). Today, we have features going in to the cycle, and localizations thereof. At this point, having feature-freeze and string-freeze being the same thing becomes a risk for the release cycle at large. We should think about how to separate the two to mitigate the risk for each effectively, and ship awesome and localized software.

I learned quite a bit looking at our risks, I hope I could share some of that.

Comments (1)

September 27, 2012

Language packs are restartless now

Filed under: L10n,Mozilla — Tags: L10n, Mozilla — Axel Hecht @ 4:04 am

Language packs are add-ons that you can install to add additional localizations to our desktop applications.

Starting with tomorrow’s nightly, and thus following the Firefox 18 train, language packs will be restartless. That was bug 677092, landed as 812d0ba83175.

To change your UI language, you just need to install a language pack, set your language (*), and open a new window. This also works for updates to an installed language pack. Opening a new window is the workaround for not having a reload button on the chrome window.

The actual patch turned out to be one line to make language packs restartless, and one line so that they don’t try to call in to bootstrap.js. I was optimistic that the chrome registry was already working, and rightfully so. There are no changes to the language packs themselves.

Tests were tricky, but Blair talked me through most of it, thanks for that.

(*) Language switching UI is bug 377881, which has a mock-up for those interested. Do not be scared, it only shows if you have language packs installed.

Comments (12)

July 20, 2012

Why l10n tools should be editors instead of serializers

Filed under: L10n,Mozilla — Tags: L10n, Mozilla — Axel Hecht @ 9:18 am

If your tool serializes internal state instead of editing files, it’ll do surprising things if it encounters surprising content. Like, turn

errNotSemicolonTerminated=Named character reference was not terminated by a semicolon. (Or “&” should have been escaped as “&”.)

into

errNotSemicolonTerminated=Named character reference was not terminated by a semicolon. (Or “” should have been escaped as “amp;”.)

And that’s for a string the localizer never touched.

(likely narro issue 316)

Comments (2)

Notes on SemWiki

Filed under: Mozilla — Axel Hecht @ 9:01 am

Semantic Wiki is nice, but it’s hard to wrap one’s head around. Thus, writing down some notes-to-non-self.

Most importantly, start with paper. SemWiki isn’t very forgiving if you reconsider. Once you’ve made some headway with paper, set up mediawiki locally. I’ve thrown my db away three times so far because I did do reconsider. FYI, that’s a tad tricky, here’s what works for me:

kill the db and recreate it
move LocalSettings.php away
load index.php, follow the configuration ’til you have a db, ignore the download
php maintainance/update.php to get the semwiki tables
Special:SMWAdmin to do tables and data update (log in again)
php maintainance/runJobs.php to speed up the data update

Using Special:CreateClass is nice, as it’s doing a whole lot of things for you. There is one thing it doesn’t, for Properties linking to Pages, it won’t ask you for the Form to create/edit those pages. Special:CreateProperty does, but that’s of little help. You can add that later by editing the Property page, and adding a Has default form like

This is a property of type [[Has type::Page]]. It links to pages that use the form [[Has default form::Aisle Milestone]].

Property name is the wiki page name of a property, Field name is the human readable name used in the forms created, btw. If you loose the mapping between field and property name, edit the form, and explicitly specify the property with {{{field|Summary|property=Has summary}}} or the like. Though, more likely, you dropped [[Has summary::{{{Summary|}}}]] in the template, I replaced that with just {{{Summary|}}}, which breaks stuff.

Also, you don’t want to use ‘/’ in your form names, that breaks editing URLs from your referring pages.

Oh, and yes, you don’t need to add the namespaces like Template: etc when entering the names in the semwiki forms, those are prepended automagically.

Another drawback of Special:CreateClass is, it’ll do all its work asynchonously, so you’ll need to wait a while ’til things are up for you to actually enter your data. Locally, you can speed things up with php maintainance/runJobs.php.

I’m still torn on how much I’d really like to use the free text, right now I’m using a Text property to create a summary that I can put in a prominent part of the template.

One more trick, if you’re using largely prefixed pages, like we do on wikimo, you can get pretty descent sorting if you edit the Template to have a category hook like

[[Category:Aisle/Use Case|{{#titleparts: {{PAGENAME}} | | -1 }}]]

This is for Aisle, for which I ended up creating my own semweb instead of using our standard feature pages. They have a ton of stuff I don’t need, and don’t have a few things I hope to use.

Comments Off

June 16, 2012

Why I deleted my account on facebook

Filed under: Mozilla — Tags: Mozilla — Axel Hecht @ 9:40 am

I opened a facebook account a good while ago to be present in communication channels where our community is. I’ve closed that account, with a host of pending “friend” requests from community members, and here’s why.

On one hand, there’s all the “you wanna be a friends of a northern-german elderly guy with a click” thing, and the “what’s the value of my life at NASDAQ”. But if facebook would have worked for me, I would have continued to bite that bullet.

The real reason I left is that facebook doesn’t work for me. My role at Mozilla is to talk and engage with people all over the world. Many of them came back with friend requests on facebook. Their friends came back with friend requests. Now, most of what they’re putting up on facebook is targetted to their social circle, and, in their language. Which is cool, but my facebook feed just turned into a series of stuff I can’t read.

And being your friend and then mute you? That’d be just rude.

I haven’t communicated there really. So, I’ve stopped being on facebook. I’m still in that 14-day grace period, but I cleared my cookies to make it through that.

If you want to keep in touch, subscribe to this blog, or catch me on twitter. And then there’s irc and email, of course.

I intend to stay on twitter for the forseeable future. I’m not fond of their promo-tweets, but I enjoy the asymmetric nature of connections there. I have no plans to join other social networks at this point.

Comments (5)

June 9, 2012

Rapid releases and the l10n dashboard are friends now

Filed under: L10n,Mozilla — Tags: elmo, L10n, Mozilla — Axel Hecht @ 12:07 pm

Wait a second, we’re on the rapid release schedule for almost a year now, and 9 releases. How can the l10n dashboard be friends with the trees only now?

Well, I’ve hacked and lied and tweaked and spoofed the data for a year. No more.

The obvious changes are:

Localizers as well as drivers can now see how far behind their work is, in release cycles
channel migration code is actually not a lie, and can be taken over by release management

On the team page, you’ll now see something like

You’ll notice the difference between the Current sign-off with the green check-mark, and the fx14 one with the looking glass. In the past, we’ve shown the green check-mark for both, now we’re actually showing the version that we’re using instead of the current one, and the looking glass is there to indicate that the localizer should actually look into this. There’s been a good deal of confusion about this, and I sure hope that this will resolve it a good deal.

There’s a ton of follow-up work, for one on elmo. This bug has been blocking a lot of other patches and work, both for the localizer-facing parts as well as the release infrastructure-facing parts.

More so, the state of localizations changes from “n missing strings, probably in this bucket” to “didn’t update since fx12”. That’s changing how we guide our work as l10n drivers a good deal. The impact between what we do and what localizers do becomes less anecdotal, and more science.

And there’s quite a few things we need to do, in particular for desktop.

Comments Off

June 7, 2012

compare-locales 0.9.6

Filed under: L10n,Mozilla — Tags: compare-locales, L10n, Mozilla — Axel Hecht @ 4:32 am

I’ve updated compare-locales with two important fixes:

License header fix for ini files, bug 760998
l10n-merge now works with multiple errors per file, bug 756448

I’ve also updated the license to MPL2.

Update your local installs with the usual commands, like

pip install -U compare-locales

The l10n dashboard is already running the new code, I’ll file a bug to get the production builds updated probably tomorrow.

Comments (1)

March 23, 2012

Migrating to the rapid release process

Filed under: L10n,Mozilla — Tags: L10n, Mozilla — Axel Hecht @ 10:45 am

Wait, what, migrating to the rapid release process? Aren’t we, like, doing that? Well, not in the data models that drive the l10n dashboard. What follows is two-fold, for one, why would I be hacking on a patch for half a year? But also, there are some interesting technical tidbits on how to do intensive data migrations in a django project. I’ll link to the actual code in full later, right now the patch is still in flux. I’ll need to write this stuff down to get a review on the patch, so why not here.

I’ve blogged about data models before, but here’s a quick glossary:

Tree models a set of repositories to run automated tests against, namely, compare-locales. AppVersion models the thing we’re shipping, say Firefox 3.6 or Firefox 13.

When gandalf and I originally designed this part of the dashboard, the relationship between what we build and what we shipped was static, kinda like

Small caveat, for AppVersions we’re not building right now, the old tree is stored as lasttree. We’ll need that data further down.

The rapid release process now lets AppVersions jump from Tree to Tree like little birds. So we need to have some intermediary that models that relationship, with start and end time. More like

Sorry for the ugliness, but that’s the pretty part so far.

Let’s start with doing that migration. Be warned, it’s migration 1 out of 4.

This first migration covers the sql schema, and persists the existing links between AppVersions and Trees. We’ll want to drop two ForeignKey columns for tree and lasttree from AppVersion, and add a ManyToMany table for the new relationship. Plus constraints and indices, sure. As I don’t speak sql, I want to use the ORM as much as I can, which sounds like one couldn’t. Enter a fake migration app that mimics the import pieces of our shipping app. It’ll contain a subset of AppVersion to the extent we need it, and the intermediate model, AppVersionTreeThrough. This needs to be really fake code, as you shouldn’t rely too much on the version of the code on disk to be really what you need for this db migration step. I cheat a bit on that, though. The import trick here is that the fake AppVersion contains both the old and the new fields, and that the fake AppVersionTreeThrough matches the one you migrate to in all flags and fields.

Now, getting django to eat a fake app is tricky. You need a module for the app, and that module needs to have a models module. Both need to be in sys.modules, and they need to have the __file__ properties set. Just because django is paranoid about equality of code and thinks it needs to verify location on disk. But hey, I can spoof all of that. Python ftw. Then you create a meta class, copying most data for db and management from your original app, and with that meta, create Model subclasses.

Ok, so now we have python code to do the migration, but we don’t have the SQL tables and columns yet. Let’s get our fingers dirty with that.

from django.core.management import color, sql
style = color.no_style()
sql.sql_all(mig_module.models, style, ship_connection)

This is going to return a list of all CREATE TABLE, CREATE INDEX, ALTER TABLE ADD CONSTRAINT sql commands for our fake migration models. The migration code inspects that, and creates the many-to-many table, adds constraints and indices, and tweaks the CREATE TABLE statement for AppVersion to just pull out the SQL definitions for the columns. Those get inserted into ALTER TABLE ADD COLUMN statements, and then we execute all that.

At this point, the database contains the old columns with the old data, and the new empty tables (and columns).

Now migrate the data, with the help of the our intermediate classes that can create the django orm magic. As we’re still in the first phase of our migration, let’s not overrotate and just make links between AppVersion and Tree over all time for those that are currently bound, and for those that used to be linked (read lasttree), make the end time just now(). We’ll fix that later. That’s a nice and easy loop.

Now we’ve used our migration app to the extent we need. It’d be nice if we could just leave it with that, but we’ve got to tear it down, because otherwise the validation step will complain about multiple apps referring to the same tables. Module caches in django are tough, the following code does that at least for 1.3:

    # prune "migration_phase_1" app again
    del settings.INSTALLED_APPS[-1]
    from django.db.models import loading
    loading.cache.app_models.pop('migration_phase_1', None)
    loading.cache.register_models('migration_phase_1')  # clear cache
    # empty sys modules from our fake modules
    sys.modules.pop(mig_name)
    sys.modules.pop(mig_models_name)

And, yes, now we can DROP stuff, at least when using mysql. sqlite doesn’t do that, and in contrast to peterbe, I don’t mind postgres ;-). Of course, as django adds constraints with rather arbitrary names, the best thing we can do is inspect the database for the actual names of those, and then we drop’em, and the columns.

And if you think this blog post is too long, you’re right. Let’s talk about the migrations 2-4.

The second migration just inspects our actual builds, and adjusts the start and end times for stuff that’s not on the rapid release cycle, and old.

The third migration fights the past. To make our mismatching data model work for the rapid cycle, I replicated a lot of history data each time we migrated appversions, to a newly created set of appversions for that cycle. Now that our data model gets that right, this migration searches for that data (Signoff entries, and their associated Action ones), and deletes them.

The forth migration sets the start and end times for the rapid release cycle, including the off-times for Firefox 5, and moves the Signoff entries from the long running aurora AppVersion to the respective AppVersions on aurora at the time. Not too bad, but really loads of weird data, and it gets worse every six weeks :-).

Phew. You made it. Now all I need to do is to fix the code that uses all that data.

Comments (1)

February 15, 2012

compare-locales 0.9.5

Filed under: L10n,Mozilla — Tags: compare-locales, L10n, Mozilla — Axel Hecht @ 8:05 am

Busy times for compare-locales, there’s another release out the door.

New in this release are a significant rewrite of the Properties parser. A lot less regular expressions, a lot more performance in bad situations. Thanks to glandium for poking me hard with a patch. That patch didn’t work, but at least it got my butt to it. Comparing bn-IN is now down to 23 secs for 3 minutes+.
The next big thing is that I now run checks on entities that are keys, too. That doesn’t seem to have caused any regressions, but look out for new false positives. On the plus side, if you use ‘&’ as accesskey, you’ll get an error report instead of a ysod.
Finally, I added support for mpl2 license headers, so we’re all set there.

As usual, update with pip install -U compare-locales.

Comments Off

January 27, 2012

What’s a glossary term?

Filed under: L10n,Mozilla — Tags: L10n, Mozilla, tools — Axel Hecht @ 5:23 pm

I’m hacking on some tool that indexes the localizable strings in our apps.

One of the fall-outs could be a glossary tool, i.e., which terms in Firefox, Thunderbird, etc should localizers bother to get consistently translated.

Which raises an interesting question, where do you draw the line? What’s a good metric to use to define a glossary? Are there glossary-based applications that don’t need a cut-off at all?

Insights welcome.

Comments (4)

« Newer Posts — Older Posts »

Axel Hecht Mozilla in Your Language

February 15, 2013

September 27, 2012

July 20, 2012

June 16, 2012

June 9, 2012

June 7, 2012

March 23, 2012

February 15, 2012

January 27, 2012