Jeremy Hiatt

Jeremy Hiatt

The highlights of my summer internship with Mozilla were undoubtedly my trip to Berlin and my runner-up finish in the office ping-pong tournament. In between, I made some cameo visits to my desk to keep up appearances. My work as a summer intern on the l10n-drivers team focused on developing l20n, an experimental project to support sophisticated grammar. L20n (localization, 2.0) allows a localizer to express complexities of language that cannot be captured in any current localization scheme. I picked up the prototype implementation (written in JavaScript) and set out to improve its performance. My goal was to demonstrate that l20n could be fast enough to present a viable replacement to Mozilla’s current localization framework.

I took several different approaches to speed up the existing l20n implementation, with varying degrees of success. The first idea was to improve its use of regular expressions. The original implementation used an inefficient algorithm to consume the input file that appeared ripe for optimization. Quite surprisingly, though, the “improvements” that I made actually regressed performance on a key testcase! Following this discovery, we theorized that our JavaScript was poorly optimized to run in TraceMonkey, Mozilla’s new JS engine, so I sought to evaluate and hopefully improve our tracing performance. The key idea here was to determine which of our lines of code gave TraceMonkey trouble. I learned to use TraceVis, a tool developed by Dave Mandelin to display a colorful graph showing how much time the engine spends in each mode. These graphs give a sense of how quickly TraceMonkey can interpret your code. As it turned out, all the extra work that TraceMonkey did to speed up JavaScript was for naught in our case, since l20n used JS features that TraceMonkey could not yet handle efficiently. These explorations didn’t pan out in terms of performance gains, but I had a lot of fun testing out experimental patches to TraceMonkey and seeing if they helped.

The next approach was to experiment with JavaScript Object Notation (JSON) to encode the localized strings. The idea was to leverage the performance of Firefox’s built-in JSON parser to speed up l20n. JSON gave us a huge boost in parsing performance, but l20n was still only about half as fast as it needed to be (still better than being ten times too slow, which is roughly where we were before). Besides being not quite fast enough, JSON also was an awkward language to express some of the advanced features of l20n. At this point in the summer, I was lucky enough to be invited to a work week in Berlin with the rest of the l10n-drivers team. While in Berlin, we were kicking around ideas to make JSON work more smoothly with l20n when we decided to try “compiling” our file format into native JavaScript. Compilation isn’t a new idea in localization, but for our purposes it turned out to be a very fruitful technique.

Upon returning to Mountain View, I hacked together a prototype that would convert a source l20n file to pure JS. Initial profiles showed that compiled l20n was fast enough to replace the current l10n infrastructure without regressing performance. Another nice benefit of compilation was that the performance wasn’t bound to a particular source format, so in my last few weeks I did some blogging about our various options for file format in hopes of attracting some fresh ideas.

I couldn’t be more grateful to Mozilla for my fun-filled, productive summer, and especially for the trip to Berlin! And finally, a description of my internship wouldn’t be complete without mentioning Samantha, my new favorite dog. She’s a worth adversary in a game of tug-of-war.