November/December Accomplishments of the WebProd Team

Chris More

Wow. It has been a really busy second half of Q4 for the Web Productions team! I wanted to share some recent accomplishments from the team and what we are up to next.

Recent Key Accomplishments

Snapshot of Upcoming Projects

Want More?

Enjoy the rest of 2013 and see you all in 2014!

Improving user experience for people all over the world

Kohei Yoshino

Every day, thousands of people visit from across the globe. As the hub site of the global Mozilla community, it has a variety of content including the Firefox download page which has been localized into 80+ languages — like Firefox itself — thanks to the tremendous contributions of volunteer localizers.

I can remember at an early age — a developer-oriented, documentation-centric geeky site. Some community members were translating those documents into their languages (I myself translated hundreds of docs to Japanese) but their work was done outside of the official site. As time passed, has become one of the most popular multilingual consumer sites on the Internet. is still under active development, though. With the increasing number of translations, how can visitors from around the world make the most of this content? In this article I’ll briefly explain some great new features I have contributed for the past few months that may lead to a better user experience for people who speak different languages.

Language Switcher is migrating from the original PHP-based site to the new Python-based robust platform called Bedrock. The legacy site had a language switcher at the bottom of each page, but there was a problem: the switcher showed all supported languages regardless of whether the current page was actually localized. When a user chose Français from the list but the page wasn’t translated into French, they were taken back to the original English page. That behavior confused them because the page just reloaded without changing anything.

The new language switcher on Bedrock only shows the localized languages for each page and works as expected. The number of the languages will continue to increase as localizers add new translations. (Bug 773371)

We believe we can improve the language switcher even further. A simple dropdown menu is accessible but difficult to use when the list becomes too long. Also, my recent research showed that the user experience on switching language was varied among the Mozilla properties. It’s obvious we need a better solution like the Tabzilla universal site navigation widget. (Bug 919869)

Search Engine Optimization

The Language Switcher is not a collection of links but rather a dropdown menu using <select>, so it cannot tell search engines about our localization. Googlebot and others are enough smart to mechanically submit such a simple form but they definitely prefer a better way to crawl.

We have implemented alternate URLs to solve the issue with just a little more HTML. Visit the home page and hit Ctrl+U (or Cmd+U) to view the source and find a list of <link rel="alternate" hreflang="x"> in the source. Search engines will recognize the list to show a localized page, if available, in their search results based on the searcher’s language. (Bug 481550)

We’ll also soon serve comprehensive XML sitemaps with the alternate URLs as part of our SEO efforts. (Bug 906882)

Translation Bar

This is the latest cool addition to You might have seen a similar functionality if you have installed Google Toolbar or the lovely Chrome browser. It may ask you if you’d like to translate a foreign-language page into your language with Google Translate. While it’s useful in general, the quality of the translation largely depends on the language. For example, Japanese, my mother tongue, is one of the most difficult languages for machine translation. Here at, we can enjoy the pages manually translated by native localizers, so why not offer our visitors the nicely localized page? That was the motivation for the new Translation Bar.

The implementation was straightforward. As described above, we already have the alternate, localized URLs in the page source. A script compares the browser locale (navigator.language) against the list, then shows the bar if the translation is available in that language. If the user selects Yes, please, he/she will be promptly redirected to the localized page. If the user selects No, thanks, sessionStorage will remember the preference and the bar will be hidden in the subsequent browsing session.

Of course, the labels on the bar are also localized. Visit a localized page to give it a try! The Translation Bar has just been deployed on and other Mozilla sites may adopt it soon. (Bug 906943)

Beyond Translation

As a Japanese Web developer and localizer, I do know localization is not just translation. Each language and country has a different culture, customs and perspectives. Under the Mozilla mission, the Web Production team is working hard to deliver a great experience for everyone. The ongoing challenges include localized news and promotions on the home page, better fonts for multibyte characters, layout improvements for RTL languages, and more. I’m very glad to help the team.

Mozilla is a lively, global, successful open-source community, and is not merely a corporate site. You can contribute in many ways, like me. Do you speak a language other than English? Be one of the awesome localizers! Did you find any bugs on or do you have any feedback on how to improve the site? Let us know via Bugzilla! Are you a Web developer with knowledge of HTML, CSS, JavaScript, or Python? Fork the GitHub repository, browse the bugs and send us pull requests!

Tracking Deploys in Git Log

Mike Cooper

Knowing what is going on with git and many environments can be hard. In particular, it can be hard to easily know where the server environments are on the git history, and how the rest of the world relates to that. I’ve set up a couple interlocking gears of tooling that help me know whats going on.


One thing that I love about GitHub is it’s network view, which gives a nice high level overview of branches and forks in a project. One thing I don’t like about it is that it only shows what is on GitHub, and is a bit light on details. So I did some hunting, and I found a set of git commands that does a pretty good at replicating GitHub’s network view.

$ git log --graph --all --decorate

I have this aliased to git net. Let’s break it down:

  • git log – This shows the history of commits.
  • --graph – This adds lines between commits showing merging, branching, and
    all the rest of the non-linearity git allows in history.
  • --all – This shows all refs in your repo, instead of only your current branch.
  • --decorate – This shows the name of each ref next to each commit, like
    “origin/master” or “upstream/master”.

This isn’t that novel, but it is really nice. I often get asked what tool I’m using for this when I pull this up where other people can see it.

Cron Jobs

Having all the extra detail in my view of git’s history is nice, but it doesn’t help if I can only see what is on my laptop. I generally know what I’ve commited (on a good day), so the real goal here is to see what is in all of my remotes.

In practice, I only have this done for my main day-job project, so the update script is specific to that project. It could be expanded to all my git repos, but I haven’t done that. To pull this off, I have this line in my crontab:

*/10 * * * * python2 /home/mythmon/src/kitsune/scripts/

I’ll get to the details of this script in the next section, but the important part is that it runs git fetch --all for the repo on question. To run this from a cronjob, I had to switch all my remotes to using https protocol for git instead of ssh, since my SSH keys aren’t unlocked. Git knows the passwords to my http remotes thanks to it’s gnome-keychain integration, so this all works without user interaction.

This has the result of keeping git up to date on what refs exist in the world. I have my teammate’s repos as remotes, as well as our central master. This makes it easier for me to see what is going on in the world.

Deployment Refs

The last bit of information I wanted to see in my local network is the state of deployment on our servers. We have three environments that run our code, and knowing what I’m about to deploy is really useful. If you look in the screenshot above, you’ll notice a couple refs that are likely unfamiliar: deployed/state and deployed/prod, in green. This is the second part of the script I mentioned above.

As a part of the SUMO deploy process, we put a file on the server that contains the current git sha. This script reads that file, and makes local references in my git repo that correspond to them

Aside: What’s a git ref?

A git ref is anything that has a commit sha. So master is a ref. So
are any other branches you create. Git also tracks remote content in
the same way, in refs under refs/remotes.

In short, a git ref is a generalization of tags, and branches, both
remote and locale. It is how git keeps track of things with names, and
it is what is written on the graph when --decorate is
passed to log.

Wait, creates git refs from thin air? Yeah. This is a cool trick my friend Jordan Evans taught me about git. Since git’s references are just files on the file system, you can make new ones easily. For example, in any git repo, the file .git/refs/heads/master contains a commit sha, which is how git knows where your master branch is. You could make new refs by editing these files manually, creating files and overwriting them to manipulate git. That’s a little messy though. Instead we should use git’s tools to do this.

Git provides git update-ref to manipulate refs. For example, to make my deployment refs, I run something like git update-ref refs/heas/deployed/prod 895e1e5ae. The last argument can be any sort of commit reference, including HEAD or branch names. If the ref doesn’t exist, it will be created, and if you want to delete a ref, you can add -d. Cool stuff.

All Together Now

Now finally the entire script. Here I am using an git helper that I wrote that I have ommited for space. It works how you would expect, translating git.log(all=True, 'some-branch' to git log --all some-branch. I made a gist of it for the curious.

The basic strategy is to get fetch all remotes, then add/update the refs for the various server environments using git update-rev. This is run on a cron every few minutes, and makes knowing what is going on a little easier, and git in a distributed team a little nicer.

That’s It

The general idea is really easy:

  1. Fetch remotes often.
  2. Write down deployment shas.
  3. Actually look at it all.

The fact that it requires a little bit of cleverness, and a bit of git magic along the way means it took some time figure out. I think it was well worth it though.

Originally from

One C++ Tokenizer Too Many: A DXR Story

Erik Rose

When your codebase is 2GB, grep doesn’t cut it anymore. It’s slow, and, in such a large corpus, many attempts to find a symbol get drowned out by false positives. Even modern IDEs begin to choke under the load. This is the domain of DXR, Mozilla’s tool for doing structured queries, free-text searches, and even trigram-accelerated regex matching on large projects like Firefox.

Of course, it’s a software engineering truism that providing speed at a moment’s notice exacts a price in pre-computation, and DXR is no exception. Every night, we run the entire mozilla-central codebase through the clang compiler, injecting a custom plugin which sees what the compiler sees and writes it all down in a database that can dish out fast answers later.

Except when things go awry.

During the Mozilla Summit, DXR had a conveniently timed series of failed indexing runs. A bit of digging revealed that, while the mozilla-central compilation was going off without a hitch, a run of the source through our custom C++ tokenizer was exploding in a later phase.

Wait—custom C++ tokenizer?!

This worn but dutiful little fossil harkens back to DXR’s pre-clang days. In the early Cretaceous, when gcc ruled the earth, we didn’t have an easy framework for compiler plugins; we had to get by on the clever application of heuristics. But, as the millennia wore on and the clang ecosystem evolved, the uses of the custom tokenizer eroded, until its only remaining purpose was to find #include directives so we could guess where they pointed—which we got wrong half the time anyway. It was time to toss that strategy in a tarpit.

And so, after a little compiler plugin tinkering, I’m pleased to announce that DXR now resolves all includes simply by lifting the correct answer out of clang. Before, we would often throw up our hands when including a file without a totally unique name (which happened a lot). Now, with only a few exceptions for weird macro corner cases, we successfully link all non-generated, tree-dwelling includes. And, of course, we lay the maintenance burden of tokenizing C++ squarely on the compiler’s shoulders, where it belongs.

Want to join us in hacking on compiler plugins, with a generous dollop of Python back-end code? Pitch in at

Updates from the Web Productions team

Chris More

Recent Key Accomplishments

Snapshot of Upcoming Projects

Interesting team statistic

  • Mozilla has roughly 50 websites, and the Web Productions team manages and develops 8 websites that account for 73% of all Mozilla web traffic!

More Info

Have any questions? Chat with us in IRC in #www or #webprod.

Full-text search in Air Mozilla with PostgreSQL

Peter Bengtsson


In a previous post I explained why and how we migrated Air Mozilla to use PostgreSQL as the default database. We did this so we can leverage PostgreSQL’s powerful full-text search feature.

First, off a tangent we go… Why not use the popular and also powerful full-text master ElasticSearch? Surely, since it’s built on top of Apache Lucene it’s bound to have some amazing full-text search and indexing features. I’m sure it does — but we don’t need them.

All we want to do is find records whose title, description or short_description contain certain words spelled in the same stem. We also want highlighting so we can display a neat search results page with the matches emphasized (something that isn’t easy to do with regular expressions in Python when the results come back).

PostgreSQL can do all of that and it’s fast. Very fast! By far, the biggest win of using the same database we already connect the Django ORM to is that we simply don’t have to worry about indexing. Like, at all. All you do is set this up as a migration:

At the moment Air Mozilla only has English content, but some day there might be more languages. How to add indexes for different languages is pretty clear; you run the same migration as above with different languages named.

That means that any inserts, updates or deletions automatically updates the full-text index for these columns in the database. We don’t have to worry about this at all, at any point in the ORM code. It just works!

Now, let’s explain how the search works. A user types in a search query. E.g. “community”.

What we want to do is to return an ORM QuerySet that:

  • contains all events that the user is allowed to see depending privacy or publishing workflow criteria and
  • whose title or short_description or description contains the search term.

And, we want it to be ranked based on matches in the title “higher” compared to matches in the short_description or description. So let’s add that to the filtering:

Now, that satisfies the “where part”. Next, we need to do something about the ranking, so we extend the code with this:

Last but not least, we want to let PostgreSQL work out the highlighting of matches so you can show extracts on the search result page with the matched words emphasized. So you extend select with some more code to look like this:

And there you have it. Note, that PostgreSQL inserts HTML markup into these title_highlit and desc_highlit extra annotations and it also escapes away any previous HTML so they’re safe to display in raw form in the Django template code. So it can look like this in the search results template:

In plain PostgreSQL SQL there are actually ways to “combine” the rank calculation with the “where criteria” so that you don’t have to do both the rank calculation and the where operation separately. However that’s way out of scope for the Django ORM API and even though it’s possible to achieve, the code will quickly get messy.

So, how long does it take to do this query? On my laptop, with a snapshot of the production database containing over 600 events, that big query takes 30-35 milliseconds. That’s fast enough.

Migrating Air Mozilla from MySQL to PostgreSQL

Peter Bengtsson

Before we dig into the how let’s take a look at the why.

From the beginning, Air Mozilla has been a straight forward Django project that uses the ORM without requiring any database specific features. It didn’t really matter what database you used. Here at Mozilla, we currently prefer MySQL because we have a rock solid and mature infrastructure set up around running it. (Thank you database team!)

This week we launched full-text search in Air Mozilla. Here’s an example search. PostgreSQL supports very powerful features specifically for full-text search, including stemming, highlighting, ranking and custom dictionaries. (Note: MySQL has full-text search indexing too as of MySQL 5.6 but it does not yet support stemming or highlighting).

So how did we migrate the database? In short, this tool: py-mysql2pgsql. What the tool does internally is that it connects to both MySQL and PostgreSQL and reads one table and record at a time to convert over to PostgreSQL. You can check out the code on

To run it, all you have to do is fill in the connection details into a YAML file for both MySQL and PostgreSQL and you should now have a working “clone”.

There was one caveat though that irked me. MySQL does not support timestamps with time zone and PostgreSQL does. Django can work around this by applying the time zone with a Django settings variable. By having the time zone information in the database, we don’t have to fake the time zone information any more. It’s also bound to be more performant because you’ve moved the conversion to aware date times nearer to the database. To make this change, we wrote a simple conversion script that you can afterwards throw into PostgreSQL. You can see the rest of the instructions for the migration here.

And here’s a little bonus; the time it took to actually run the migration was approximately 10-15 seconds to migrate over 25,000 rows across 42 tables. That’s connecting to a MySQL and a PostgreSQL in two separate different locations in the same data center.

In a follow-up post I will try to explain more about how we do the full-text search in PostgreSQL with Django.

Beer and Tell – September 2013 Edition

Michael Kelly

Gather ’round, children. Your distant cousin mkelly is going to share a tale of excitement and mystery, of heroes and villains, of action and adventure.

That’s right, it’s the Beer and Tell Recap! You can also check out the wiki page or the recording.


Screen Shot 2013-10-03 at 10.57.19Simon Wex and Robert Richter from the Mozilla Foundation presented Appmaker, an experiment into whether we can make it fun for non-developers to quickly make working apps. Widgets send “mail” to other widgets, which triggers them to do stuff like display cat pictures or take a photo. You can track development of Appmaker on their Github repo.


My own Beer and Tell project is diecast, a grunt-init template for single-page frontend apps. diecast sets you up with Grunt commands for building and publishing your site to Github Pages, and includes require.js for JavaScript module loading and LESS for CSS preprocessing. It also uses Bower for downloading JavaScript libraries that you want to use. It’s a cornucopia of JavaScript buzzwords!

Screen Shot 2013-10-03 at 10.56.48Brian Brennan showed us, a community of learning focused on node.js. It’s focused around terminal-based challenges where you write code to solve realistic problems. Planned improvements include user accounts and open badge support.

OpenCL in the Browser

Scott Michaud presented a super-secret project (no links, sorry) that demoed a software renderer powered by OpenCL code running in Firefox. Presumably he was using the WebCL prototype plugin released by Nokia Research recently. Check it out!

Dennis Dubstep

While he was absent and thus didn’t present, Will Khan-Greene still added a screenshot of SUMO localized to dubstep on the wiki. This was achieved by using a dubstep locale in dennis, a set of localization tools that can, among other things, help you test out how your site looks with excessively large strings that may come from certain languages.


Matt Basta told us about Crass, a CSS minifier that parses CSS instead of applying transformations, as most minifiers do. This allows it to perform optimizations that most other minifiers can’t, such as reordering properties and transforming values. Crass can also pretty-print the parsed CSS and is written in both Python and JavaScript.


Basta also shared Panopticon (app is down as of writing), which is sort of like Skype and IRC combined. You join a room, select a user in that room, and get to see an animated GIF captured from their webcam, live. There’s a Github repository as well, for those who are interested or confused.

Updated Nunjucks Documentation

James Long demoed an upcoming update to the documentation for Nunjucks, a jinja2-inspired templating system for JavaScript. Keep an eye out for the update, which includes a full overview of the template language and API!


Screen Shot 2013-10-03 at 19.28.10Finally, Michael Cooper presented a short game called BATCH, a programming game written in JavaScript. The main purpose of the game was to see if he could write a Beer and Tell project starting at Noon on the day of the event. I highly recommend that other people try this as well, if for nothing else than to stuff the Beer and Tell project list for next month.

See ya’ll next month!

Where and Firefox Intersect

Holly Habstritt Gaal

written with Chris More and Jennifer Bertsch

Engagement’s Web Productions team has grown from a small team of technical project managers, to a multidisciplinary web development team that initiates projects, uses metrics and qualitative testing to learn from our users, and has an iterative approach to web development. We’ve had a chance to see the influence that this growth, paired with the collaboration of teams across Mozilla, has had on our work. We can be pre-emptive instead of reactionary, share our knowledge and tools, and facilitate design process and collaboration. We would like to reintroduce ourselves to Mozilla and the UX community, expose where our team intersects with the user experience of our products, and invite you to collaborate with us.

Not your typical web product does not solely exist for marketing our products. It is unique from most web sites in that to support our products and users when they need us, we must stay inline with the roadmaps and release cycles of our products. For example, there are many touch points with our users that take place on, some of which are part of the onboarding process. Onboarding is more than just downloading our products as it extends to the first “unboxing” experience, updating Firefox, and sharing helpful information about new product features. This ultimately contributes to retention and an understanding of our Firefox and Mozilla brands.

In our user tests, we’ve found that users are more likely to respond positively when they have an expectation of both when and how a message is delivered. For and our products, this expectation can be set by previewing an upcoming feature or new design on for Firefox users. It can also be handled with a consistent pattern for how we present content updates, notifications, and new features across all of our products. The WebProd team doesn’t accomplish this alone and this is one example of how our users can benefit by our teams staying connected.

Staying connected: the intersection of our roles and roadmaps
What I’ve found at Mozilla is that separate teams are often working on similar challenges and share common goals. Collaborating across teams has been a great way to meet and learn from each other and is key to addressing our intersecting issues efficiently while ultimately creating a better end product. Most recently we have worked with SUMO to stay better aligned on presenting Firefox help messaging and we have also been collaborating on a cross-team effort to improve the First Run and Update experiences.

At Mozilla we are all part of the chain of reactions that results in what our users experience. The WebProd team has been keeping the following in mind to better support Firefox users:

  • Align our websites to product roadmaps so we can offer support to our end users
  • Optimize user onboarding flows
  • Work in parallel with Engagement and Product teams’ goals
  • Ensure website content is localized in many languages
  • Complete migration of all legacy pages to Bedrock and our Sandstone theme, which is responsive by nature.
  • Support users on any device or operating system
  • Evaluate > Test > Improve

A significant way we can all support our users is to recognize the intersections between our teams at Mozilla and the overlapping initiatives in our roadmaps. If the WebProd team can collaborate with you to create a better experience for our users, don’t hesitate to reach out to us.

We’re easy to find!

DXR gets faster hardware, VCS integration, and snazzier indexing

Erik Rose


DXR is Mozilla’s fast, full-featured code search tool for doing structured queries, free-text searches, and regex matching on huge codebases, like Firefox’s. Since my last post, we’ve completely replaced the production hardware, integrated with VCSs, and made scads of indexing improvements. The last quarter’s highlights include…

  • New stage and production hardware, with our own dedicated build box so we can support multiple codebases
  • Nightly updates from the mozilla-central tree, for both stage and prod
  • No more hours of downtime when a mozilla-central build fails
  • Blame, log, diff, and raw-file links in the sidebar
  • Much better JS syntax highlighting
  • Linkifying much more C++ template stuff: references to class templates, template params, and base classes of class templates when the base class is also a template and dependent on the type params of the derived class template (phew!)
  • Searching for namespaces and namespace aliases
  • Better finding of C definitions and declarations
  • Clang 3.3 support

Tell us what you want.

Now we want to hear from you. We’ve got a big, juicy fourth quarter coming up, and we want to make you happy. What would you, as a current or potential DXR user, like to see happen? Some ideas already high on our list are…

  • Indexing multiple trees, like Aurora and comm-central, eventually targeting the full list from MXR
  • A UI refit that will resolve client-side bugs, improve consistency, and add power. Take a look at the wireframes!
  • Structured search for JS: finding function definitions and calls, variable refs, and so on

What are your highest priorities? What would help you hack better on Mozilla code today? Leave a comment about what’s most important to you, whether it’s in the above list or not, and we will build our Q4 goals based on what you say.

Also, I’ll be manning a table at the Innovation Fair at the Mozilla Summit in Santa Clara. Stop by and make DXR wishes in person!

Finally, thanks to the people who make DXR possible: fubar, for all his ops work; and jcranmer, abbeyj, nrc, Bruce Stephens, jonasac, and nicolaisi, who keep the patches rolling in faster than I can review them!