Axel Hecht Mozilla in Your Language

July 20, 2012

Why l10n tools should be editors instead of serializers

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 9:18 am

If your tool serializes internal state instead of editing files, it’ll do surprising things if it encounters surprising content. Like, turn

errNotSemicolonTerminated=Named character reference was not terminated by a semicolon. (Or “&” should have been escaped as “&”.)

into

errNotSemicolonTerminated=Named character reference was not terminated by a semicolon. (Or “” should have been escaped as “amp;”.)

And that’s for a string the localizer never touched.

(likely narro issue 316)

June 9, 2012

Rapid releases and the l10n dashboard are friends now

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 12:07 pm

Wait a second, we’re on the rapid release schedule for almost a year now, and 9 releases. How can the l10n dashboard be friends with the trees only now?

Well, I’ve hacked and lied and tweaked and spoofed the data for a year. No more.

The obvious changes are:

  • Localizers as well as drivers can now see how far behind their work is, in release cycles
  • channel migration code is actually not a lie, and can be taken over by release management

On the team page, you’ll now see something like

You’ll notice the difference between the Current sign-off with the green check-mark, and the fx14 one with the looking glass. In the past, we’ve shown the green check-mark for both, now we’re actually showing the version that we’re using instead of the current one, and the looking glass is there to indicate that the localizer should actually look into this. There’s been a good deal of confusion about this, and I sure hope that this will resolve it a good deal.

There’s a ton of follow-up work, for one on elmo. This bug has been blocking a lot of other patches and work, both for the localizer-facing parts as well as the release infrastructure-facing parts.

More so, the state of localizations changes from “n missing strings, probably in this bucket” to “didn’t update since fx12”. That’s changing how we guide our work as l10n drivers a good deal. The impact between what we do and what localizers do becomes less anecdotal, and more science.

And there’s quite a few things we need to do, in particular for desktop.

June 7, 2012

compare-locales 0.9.6

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 4:32 am

I’ve updated compare-locales with two important fixes:

  • License header fix for ini files, bug 760998
  • l10n-merge now works with multiple errors per file, bug 756448

I’ve also updated the license to MPL2.

Update your local installs with the usual commands, like

pip install -U compare-locales

The l10n dashboard is already running the new code, I’ll file a bug to get the production builds updated probably tomorrow.

March 23, 2012

Migrating to the rapid release process

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 10:45 am

Wait, what, migrating to the rapid release process? Aren’t we, like, doing that? Well, not in the data models that drive the l10n dashboard. What follows is two-fold, for one, why would I be hacking on a patch for half a year? But also, there are some interesting technical tidbits on how to do intensive data migrations in a django project. I’ll link to the actual code in full later, right now the patch is still in flux. I’ll need to write this stuff down to get a review on the patch, so why not here.

I’ve blogged about data models before, but here’s a quick glossary:

Tree models a set of repositories to run automated tests against, namely, compare-locales. AppVersion models the thing we’re shipping, say Firefox 3.6 or Firefox 13.

When gandalf and I originally designed this part of the dashboard, the relationship between what we build and what we shipped was static, kinda like

Static relation between AppVersion and Tree

Small caveat, for AppVersions we’re not building right now, the old tree is stored as lasttree. We’ll need that data further down.

The rapid release process now lets AppVersions jump from Tree to Tree like little birds. So we need to have some intermediary that models that relationship, with start and end time. More like

Connect AppVersions and Trees through a model in time

Sorry for the ugliness, but that’s the pretty part so far.

Let’s start with doing that migration. Be warned, it’s migration 1 out of 4.

This first migration covers the sql schema, and persists the existing links between AppVersions and Trees. We’ll want to drop two ForeignKey columns for tree and lasttree from AppVersion, and add a ManyToMany table for the new relationship. Plus constraints and indices, sure. As I don’t speak sql, I want to use the ORM as much as I can, which sounds like one couldn’t. Enter a fake migration app that mimics the import pieces of our shipping app. It’ll contain a subset of AppVersion to the extent we need it, and the intermediate model, AppVersionTreeThrough. This needs to be really fake code, as you shouldn’t rely too much on the version of the code on disk to be really what you need for this db migration step. I cheat a bit on that, though. The import trick here is that the fake AppVersion contains both the old and the new fields, and that the fake AppVersionTreeThrough matches the one you migrate to in all flags and fields.

Now, getting django to eat a fake app is tricky. You need a module for the app, and that module needs to have a models module. Both need to be in sys.modules, and they need to have the __file__ properties set. Just because django is paranoid about equality of code and thinks it needs to verify location on disk. But hey, I can spoof all of that. Python ftw. Then you create a meta class, copying most data for db and management from your original app, and with that meta, create Model subclasses.

Ok, so now we have python code to do the migration, but we don’t have the SQL tables and columns yet. Let’s get our fingers dirty with that.

from django.core.management import color, sql
style = color.no_style()
sql.sql_all(mig_module.models, style, ship_connection)

This is going to return a list of all CREATE TABLE, CREATE INDEX, ALTER TABLE ADD CONSTRAINT sql commands for our fake migration models. The migration code inspects that, and creates the many-to-many table, adds constraints and indices, and tweaks the CREATE TABLE statement for AppVersion to just pull out the SQL definitions for the columns. Those get inserted into ALTER TABLE ADD COLUMN statements, and then we execute all that.

At this point, the database contains the old columns with the old data, and the new empty tables (and columns).

Now migrate the data, with the help of the our intermediate classes that can create the django orm magic. As we’re still in the first phase of our migration, let’s not overrotate and just make links between AppVersion and Tree over all time for those that are currently bound, and for those that used to be linked (read lasttree), make the end time just now(). We’ll fix that later. That’s a nice and easy loop.

Now we’ve used our migration app to the extent we need. It’d be nice if we could just leave it with that, but we’ve got to tear it down, because otherwise the validation step will complain about multiple apps referring to the same tables. Module caches in django are tough, the following code does that at least for 1.3:

    # prune "migration_phase_1" app again
    del settings.INSTALLED_APPS[-1]
    from django.db.models import loading
    loading.cache.app_models.pop('migration_phase_1', None)
    loading.cache.register_models('migration_phase_1')  # clear cache
    # empty sys modules from our fake modules
    sys.modules.pop(mig_name)
    sys.modules.pop(mig_models_name)

And, yes, now we can DROP stuff, at least when using mysql. sqlite doesn’t do that, and in contrast to peterbe, I don’t mind postgres ;-). Of course, as django adds constraints with rather arbitrary names, the best thing we can do is inspect the database for the actual names of those, and then we drop’em, and the columns.

And if you think this blog post is too long, you’re right. Let’s talk about the migrations 2-4.

The second migration just inspects our actual builds, and adjusts the start and end times for stuff that’s not on the rapid release cycle, and old.

The third migration fights the past. To make our mismatching data model work for the rapid cycle, I replicated a lot of history data each time we migrated appversions, to a newly created set of appversions for that cycle. Now that our data model gets that right, this migration searches for that data (Signoff entries, and their associated Action ones), and deletes them.

The forth migration sets the start and end times for the rapid release cycle, including the off-times for Firefox 5, and moves the Signoff entries from the long running aurora AppVersion to the respective AppVersions on aurora at the time. Not too bad, but really loads of weird data, and it gets worse every six weeks :-).

Phew. You made it. Now all I need to do is to fix the code that uses all that data.

February 15, 2012

compare-locales 0.9.5

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 8:05 am

Busy times for compare-locales, there’s another release out the door.

New in this release are a significant rewrite of the Properties parser. A lot less regular expressions, a lot more performance in bad situations. Thanks to glandium for poking me hard with a patch. That patch didn’t work, but at least it got my butt to it. Comparing bn-IN is now down to 23 secs for 3 minutes+.
The next big thing is that I now run checks on entities that are keys, too. That doesn’t seem to have caused any regressions, but look out for new false positives. On the plus side, if you use ‘&’ as accesskey, you’ll get an error report instead of a ysod.
Finally, I added support for mpl2 license headers, so we’re all set there.

As usual, update with pip install -U compare-locales.

January 27, 2012

What’s a glossary term?

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 5:23 pm

I’m hacking on some tool that indexes the localizable strings in our apps.

One of the fall-outs could be a glossary tool, i.e., which terms in Firefox, Thunderbird, etc should localizers bother to get consistently translated.

Which raises an interesting question, where do you draw the line? What’s a good metric to use to define a glossary? Are there glossary-based applications that don’t need a cut-off at all?

Insights welcome.

January 26, 2012

Web-based IDEs for Localization

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 9:16 am

There isn’t much news on the localization tool front that I started at MozCamp in Berlin, but I’ve got some more questions for the web tool guys among you. As any good project, a localization editor should stand on the shoulders of giants, so I’ve been looking at Orion, Cloud9/Ace, and etherpad-lite. All of them got parts right, and I try to gather some input what it’d take to bring home the rest. I’ve set up an etherpad each for you to type to,

A bit of context: Localization is editing code by people that don’t (necessarily) code. An editor needs to take an extra step to make breaking things hard, and editing the right pieces easy. There’s also a host of content assist available to suggest translations, spell checking, glossaries, etc. Which opens two paths: You offer forms, and your localizer will never learn what’s happening in real life, or you drive a code editor beyond what the programmer of that code editor ever needed herself. I would love to try the latter again, this time on the web.

So, if you have some experience with tweaking, wrestling, extending, stripping any of these, or with one I missed, I’d be thankful for your input.

January 20, 2012

compare-locales 0.9.4 released

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 4:37 am

There’s yet another update to compare-locales, we’re now at 0.9.4. Please update your local installs with

pip install -U compare-locales

Changes since 0.9.3 are:

  • Catch % as error. Sadly, there’s not much more the parser reports than Invalid Token, but at least it says something. You need to escape that as %.
  • Stability fix, there was a crash on <!ENTITY "reference to &ƞǿŧ;-known entity">. Unicode is hard.

December 26, 2011

Minor update to compare-locales for mobile/android/base

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 6:15 am

I’ve just pushed a minor update to compare-locales to pypi and the dashboard.

The only change is that it applies the android quote tests to the files in mobile/android/base.

As always, update your local installs by

pip install -U compare-locales

December 7, 2011

compare-locales 0.9.2 released

Filed under: L10n,Mozilla — Tags: , , , — Axel Hecht @ 5:38 am

I just uploaded a new release of compare-locales to pypi, hg.m.o and github.

Changes since the last released version:

  • Support for nested l10n.inis, notably, browser/branding.
  • Errors on CSS specs, notably, if en-US is a length or min-width etc, the translation also needs to be one.
  • Warn if CSS specs don’t match in property or unit. Say, en-US gives min-width:14ex and the localization has width:120px, warn. Thanks to Rimas for the request.
  • Warn if en-US is just a number, and the localization is not.

See also the pushes on hg.m.o.

You can install/update with

pip install -U compare-locales

Next up is to use the new version on the dashboard.

It’s not part of our release automation, though. Bug 650465 met some resistance in release-drivers, IIRC, as we’d need to change what we’re shipping in 3.6. More errors means failure unless l10n-merge is on on existing builds, which effectively changes all 20 locales that have errors on 3.6.

« Newer PostsOlder Posts »

Powered by WordPress