Axel Hecht Mozilla in Your Language

July 22, 2011

Data models and “vom Kopf auf die Füße”

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 4:28 am

As you all know we’re having a new release scheme. That’s all good and great for localization, but there’s one tiny little peppermint: It exposed each and every design problem in the l10n dashboard, code-named elmo these days.

As many folks wonder why I’m still talking about how the l10n dashboard needs more work, I’ll put some details out there.

The Milestone object is the thing we use to keep track of which version of a localization was shipped in which release-style build. It’s backing up views like Fennec 6 Beta 3 milestone info page, and says “we’re adding pl, and updating nl, ru, zh-TW”. That could be used for QA and verification etc.

The AppVersion object is tracking a particular release. Say, Firefox 3.6 or Firefox 6. It’s containing a series of milestones. The AppVersion objects are tied to an Application object.

The actual compare-locales builds are hooked up to a Tree object, which represents the repositories to compare for a particular application.

The trick is how all these objects are tied together. Gandalf and I designed this back in the days of the Firefox 3.6 release. Back in those days, we had loooong release cycles, with lengthy cycles even for individual milestones, and string freezes for each milestone. At that point, we’d open up sign-offs. Remember, back in the days we wouldn’t have l10n-merge on for release builds, so we could only start reviewing the localizations after string freeze. Also, we did the hg branches for a release early in the cycle, and then we would ship most of our betas from that branch, while development on central progressed merrily.

Thus, our design decisions back then were:

There’s one static repository setup for a version of an application. Umpf. Can you see how bad that is today, where we switch our repo setup every six weeks?

Whether a localizer can sign-off or not depends on whether the upcoming milestone is string frozen or not. In other words, we need to have the upcoming milestone early to begin with, which is such a hassle now that we’re doing them weekly, instead of bi-monthly. Also, with l10n-merge and string-frozen branches, all that logic just … face palm.

Localizers sign off on a version of the application, with a push to its l10n repository. Pushes are per repo, appversions are spanning repos today. I.e., I push on aurora, sign off, it’s good, the appversion migrates to beta, but the push is still on aurora.

Review actions on sign-offs are forever. Say, I r+ a sign-off on aurora, that goes to beta, but there’s a lack of traction that makes that revision really bad to ship for the next cycle. I can’t make that sign-off bad for Firefox 12 and good for Firefox 11.

Lessons learned:

  • appversions hop from tree to tree, over time
  • sign-offs are per tree, this localization at this point is good, source-wise
  • actions on sign-offs can be per appversion
  • milestones aren’t required before we actually ship something

Or, as we say in German, we have to put the design “vom Kopf auf die Füße”.

May 10, 2011

Reviewing sign-offs, slightly different

Filed under: L10n,Mozilla — Tags: — Axel Hecht @ 3:39 am

Opening the magic box of l10n admin stuff:

We’re doing sign-offs, y’know? Localizers hit the l10n dashboard and click a button to say “this revision is good to ship”. Which is cool, because then they don’t need approval for every patch for release branch fixes.

And I’m reviewing the signoffs. Sounds all good, and well proven.

Enter the new release cycle. What’s new? This is a small update, on a quick turnaround. So I can’t do what I did for previous releases, and just not review the first sign-off. That was just an early beta (for most locales) and had a 1000 new strings. Sounded fair. Anyway, now we’re just doing 30 strings, so doing an incremental review against what’s on 4.0.1 is in order. So what does that mean?

  1. I need to get the revision that we’re shipping on 4.0.1. For sign-offs that update one branch, that’s all hooked up in the UI, but not for 4.0.x to 5. That’s bug 655943.
  2. The revision on 4.0.1 is on l10n-mozilla-2.0, the signoff on 5 is on a revision of l10n/mozilla-aurora. Neither need to exist in the other repo, so you can’t just use plain hg commands on a repo. That’s bug 655942.

Now, I wouldn’t be me if I wouldn’t script myself out of it, here’s the gist of it. And yes, this blog post is as much code comments as there are for that one.

April 11, 2011

Being a localizer in the rapid release cycle

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 6:15 pm

We’re changing to a 6-week release train model, and this is going to impact how localizers do their contributions. The following scheme has been cycled in .planning for a bit, so this is what we’ll be doing. We’ll adapt that if needed, of course, but based on experience with the next cycle or two.

Recap on the rapid release cycle: en-US developers work on mozilla-central, as they used to, and every 6 weeks, we’ll pull their contributions to another repository, called mozilla-aurora. That repository is string frozen. String changes only land in this repository as part of the merge from central to aurora. After another 6 weeks, the content goes to yet another repository, mozilla-beta. Corresponding to those, there’s l10n/mozilla-aurora and l10n/mozilla-beta. And now you know. Find a glossary at the end of this post.

There are two different localizer schemes: Early birds and friends of string freeze. Read the following descriptions and pick one for your individual localization team.

Early Birds are those localization teams that are happy to follow the mozilla-central content quickly and make sure that all issues relating to localizing that code are found and fixed. We already have a few of those that have built their reputation among our hackers to have good input to follow. We don’t need a lot of those, but the ones we have are crucial to make the plan work, and have code that is properly localizable at any time on aurora. You’ll be following the fx_central tree on the l10n dashboard to catch up on changes.

Friends of String Freeze are those teams that prefer to have stable content to localize with a decent time window to act on it. Many of our localization teams are in this group. If you’re in this group, you’ll set your calendar alarm to the next window, hg pull -u on your mozilla-aurora clone, your l10n/mozilla-aurora clone, localize, push, test, fix, push, sign-off. Then you set your calendar to the next 6-week cycle, and you’re all set. The expectation here is that the amount of strings will be rather low, so a day of l10n plus testing and fixing is fine. Usually, you should be able to deliver a great localization for the next version of Firefox in some 3 days. Firefox 5 right now is some 30 strings, other releases will be a good deal bigger. But nowhere close the 1.2k strings of Firefox 4. You’ll be watching the fx_aurora tree on the l10n dashboard to see the status of your localization.

Sign-offs will happen on aurora, in rare cases on beta. The setup where we work towards release is aurora.

What about the beta repositories? Well, I hope to not see a necessity to land on l10n/mozilla-beta for the most part. You should expect that changes you make on l10n/mozilla-beta will be dropped once we do the next update from aurora, so you want to have the fixes on both aurora and beta, if applicable. But really, you want to be good on aurora. Then beta will be fine and no hassle.

How that maps to mercurial work:

For the Friends of String Freeze, you’ll not need to worry about anything other than pulling on both repos every cycle. We’ll take your content from l10n/mozilla-aurora to l10n/mozilla-beta, and may very well at some point stop doing l10n-central builds at all for you. Just keep things simple here.

For the Early Birds, we’ll rely on you self-identifying and doing a tad of extra work. You’ll be in best shape to merge your contributions from l10n-central to l10n/mozilla-aurora, making sure that the result has all your fixes from both central and aurora, where you want them. You’re techy-geeky-savvy anyways, so that’s allright. If at some point, we learn that there’s a pattern that benefits from automation, we’ll check in on that when we get there, too. You shouldn’t have to worry about getting content on l10n/mozilla-beta anymore than the rest, though.

Glossary:
mozilla-central is the mercurial repository that en-US code is landed to as development makes progress.
l10n-central is the tree of mercurial repositories that the early-bird localizers use as development makes progress.
central is short for either, or both, of mozilla-central and l10n-central, depending on context.

The terms around mozilla-aurora, l10n/mozilla-aurora, and aurora map to their corresponding terms for central, same for mozilla-beta, l10n/mozilla-beta, and beta.

Update: Fixed the links to map to the new and stable repository locations.

November 24, 2010

compare-locales 0.9.1 is out

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 7:30 am

I released compare-locales 0.9.1 yesterday on pypi. Do the regular

easy_install -U compare-locales

to update your local copy.

This update includes two bug-fixes compared to 0.9,

  • Don’t warn about XML-defined entities like &, bug 604404
  • Ensure that merged entities have a trailing newline, bug 612619

In particular the latter will make our l10n-merge code more stable. Sadly, we actually need to fix all the newly-reported errors in all stable branches and apps before we can update the production tag. Errors make compare-locales fail, and rightfully so. And fail is bad for release builds that don’t merge, also rightfully so.

November 5, 2010

MultilingualWeb: Workshop in Madrid

Filed under: L10n,Mozilla — Tags: , , — Axel Hecht @ 8:24 am

So I’ve been at the W3 MultilingualWeb Workshop in Madrid last week, and I guess there are a few things worth reporting.

MultilingualWeb is a project bound to host 4 workshops to bring people from different fields together to see how standards and best practices (existing and not) can help the web. Being mozilla, we don’t really need to add that it’s beyond just one language, right? The effort is strongly supported by the European Union, so there’s a bias towards participants in these workshops being from Europe, though the folks by themselves certainly talk beyond that.

The crowd in Madrid was really diverse, standards people, government (EU and India), researchers, content, and, well, browsers. The browsers people were Charles McCathieNevile (Opera), Jan Nelson and Peter Constable (Microsoft), and me (Mozilla). There we no folks from webkit-based browsers.

Interesting bits and pieces:

I guess other people made that experience lately, too, but I welcome the way that MSFT is positioning themselves lately. Now they just need to compare beta builds to beta builds, and, (insider joke) while we hack on canvas, you learn JS:

- ctx = canvas1.getContext("2d");
+ ctx = document.getElementById('canvas1').getContext("2d");

Still need to actually look at the results in competing browsers, and not on my font-broken OSX, but we’re not doing too bad.

Gecko should really start using CLDR data for stuff like plurals, dates, calendars, lists. I should also really read up on ES’ i18n_api.

It was interesting to see common questions on what’s a language from Denis Gikunda, who’s working on l10n for google in sub-saharan Africa. Now that Anloc is coming in with their localizations, we’re getting more exposed to how the history of those languages is so different from European ones.

Facebook’s Ghassan Haddad reported on a few interesting things. Like Zuckerman coming into his interview with “you can’t slow our development down”. Interesting about this is that the resulting infrastructure is far from zero-impact on the development. There are quite some restrictions on what content you can put up, and you have to add syntactic sugar all over, too. Go check their docs for details. Also, they’re not slowing down the publishing of localizations.

We got a bit of detail in the discussion about vandalism in fb l10n. They initially relied on community there, but when they got hit, they took down the localized sites until they had tooling support. Ghassan didn’t come forward with details on what they do, though.

They are doing something conceptually similar to l20n to localize their social messages like “A is now friend with B, C”, to make those depend on all the genders. IIRC, they call it string or entity explosion. Didn’t get to ask any questions about this one, sadly.

Most of the science people talked about processes that all sound very good for the data we get from feedback in Firefox 4 betas. Natural language processing with trends detection, “translation” of SMS Spanish into Spanish, and much more. Sadly, there’s nothing shrink wrapped that we could just use, but there’s interest in creating a project to find out, maybe for Firefox Next?

One thing that felt slightly odd was the Semantic Web. I thought that was dead, but there’s still optimism around that. Maybe semantics that help machine translation make a case for it, I’m not sure. Also, there seems to be more structured data coming to the “public web”, and the algorithms that transform the “hidden web” into the “public web” could more easily add markup than human authors would. Still, there wasn’t much hope in the browser people. Luckily, the browser doesn’t really need to do anything but creating a DOM, and passing markup around for machine translation engines taking benefit from additional semantics.

Last but not least, I did finally get to spend some quality time with our Madrid community, thanks to the folks for taking me out twice. I had a great time, and sorry that my English speaking tempo aligned with your Spanish speaking tempo way too often :-).

October 4, 2010

Releasing compare-locales 0.9, aka, the value checker

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 9:06 am

I’ve just uploaded version 0.9 of compare-locales onto pypi. It’s finally the version that does all the fancy value checks that I’ve been talking about for a while, and that some of the localizers have seen flying by in their bugmail.

Here’s what it does:

For DTDs, I create fake xml docs, and try to parse them. This should find encoding errors, as well as unbalanced XML tags or stray ‘&’ ampersands. There’s one thing that’s tricky, and that is references to entities. I do get the list of entities from en-US, so I do have a good idea which should work (really, please). On the other hand, referencing other entities may not be an error. &rdquot; for example could be totally fine. If referenced in an XHTML document, that is. Not if it was included in a XUL document. Of course both breeds could include the same DTD file. I can’t really tell, so I’ve added a new category of reporting, called warnings.

For properties files, I check a bunch of printf tricks. Some of those are warnings, some of those are errors. Which is which basically depended on code-inspection. I also did some heuristics based on comments referencing the plural docs to check for our plurals-special variable handling.

Outstanding are the installer variable checks still, didn’t want to hold back this release for that. They’re somewhat tricky in the details and yet more tedious to get right than the other checks.

What does that mean for localizers? You wanna get the error count down to zero. The warnings count may or may not go down to zero, that’s your call.

The new version isn’t in public use anywhere yet, the deployment will go like this:

  • Get a round of public feedback on this release.
  • Use on the dashboard (likely gonna happen when I do the 2.0 branch dance, too).
  • Try to get the new version used on the build system.

Please give the new version a bit of pounding in your local l10n-merge builds, too. It should strip entities with errors from your localization, and merge in en-US strings for that.

Feel free to file bugs on issues you find.

August 31, 2010

Lazyweb, can I have compilers in js and python?

Filed under: L10n,Mozilla — Tags: — Axel Hecht @ 8:10 am

For our l20n project, we’ll want to compile l20n source files into javascript. We want to do that both at compile time, and at runtime.

For runtime, I’ll need the compiler written in js by all chances, and for compile time, I’d rather go with python so that I don’t have build a HOST_JS or something. Of course, I don’t want to maintain two completely independent compiler implementations.

Thus I’m looking for code that can generate compilers in js and python, preferably itself in python or some other language we can use at build time, or at least use for one-off compilations.

Any tips?

July 27, 2010

Porcupine, meet Churchill

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 12:16 pm

I’ve been talking with Seth today on how we can answer questions about the status of l10n. My grumpy argument was that I wouldn’t know how to make graphs over time actually show progress, instead of just “failure”. I had two naive graphs, one is showing all missing strings summed up over all locales. That graph would be dominated by the long tail of several dozen locales with a few hundred strings each, and you wouldn’t see a dozen fighting over a few strings each.

The other is what I nick-name “porcupine graph”, show how many locales have no missing strings, vs those that have some missing strings. This is what’s actually implemented on the l10n dashboard as tree progress graphs. But how ever small a string change would be, it goes to all red. And it doesn’t help that one can’t mix green and red color gradients, so the graph usually shows spikes of red and a little black.

porcupine

Who’d want that as their progress stats, huh?

Now, during the chat with Seth I came up with the idea to just give a little bit of leeway, and accept some missing strings to be OK, at least for some time. I filed bug 582280 on that, and made a rough initial implementation of it. Nothing fancy, just a constant ignored bound of missing strings. Let’s see how the past two weeks of Firefox 4 look now, with just a total of 5 missing strings being OK, ?bound=5:

two weeks good and bad

Now Churchill won over the porcupine, but it’s still pretty red. Which is OK, we haven’t even branched yet, right? So I went ahead and figured I’d add an option hideBad:

two weeks good

Wow, progress. This graph actually looks like our community rocks as much as it does. Gets me grumpy, because this was really just about half an hour of work, plus a few years of thinking.

Now, how do we look on the long run, say, well over half a year? Bumping the bound up to 15, we’re doing like

half year progress

Pretty good, heh? You can play with it on the dashboard, too. The overall take aways would be:

We have about 20 locales that really track trunk.

We didn’t have that many landings with a high amount of added strings.

I like both :-).

July 26, 2010

Looking at a l10n bugzilla classification

Filed under: L10n,Mozilla — Tags: , — Axel Hecht @ 1:44 pm

We intend to move from components per locale in the “Mozilla Localizations” product to a matrix of products per locale, and components for each of Firefox, Thunderbird, et al. I’ve created an add-on to set up the products and components and laid out in the newsgroup thread. I wanted to share some screen shots on how things look locally now.

enter_bug.cgi?classification=Mozilla in Your Language looks like this:

enter_bug.cgi

Localizers can edit the descriptions on localize.m.o. I’m not totally convinced that the current formatting of the products are great. The double () braces disturb me, both here and on the actual bug form (see below). I might prefer “l10n:ab-CD Language (Region)”.

Enter bug

This is the actual bug entry form, and shows the localized component description. It also shows a rather confusing line wrapping of the product name.

Another aspect that we were concerned about was how it’d look if you changed the product of a bug. Locally, this looks like this now:

Re-productize bug

Got comments? Please leave them in the original newsgroup thread, or here.

July 21, 2010

l20n meetup in the european times

Filed under: L10n,Mozilla — Tags: — Axel Hecht @ 3:02 pm

I heard there was interest to join the l20n discussions, so I’ll do an “even more public” invitation to tomorrow’s l20n call.

We’re going to have that call on conference bridge 206 at 11 am CET, standard mozilla conference call details. Blame Seth for being almost-european these days, even if it’s just timezone. (No, not calling London Europe, no way.)

The agenda for this call is to look at some l20n-compiled files for browser.xul to make an educated decision on how to encode both external vs internal properties, and multi-locale files.

Sorry for the late invite, if you’re not on my radar and can’t make the meeting at such European hours, please follow up here, or by mail, and leave a note of your timezone.

« Newer PostsOlder Posts »

Powered by WordPress