This is the final post in a series of three posts on Next-Generation Web Localization at Mozilla. If you haven’t read it, please start with the first post. Again, this is a discussion starter and nowhere close to a final decision. Your opinion on the issue is very valuable, so please leave comments.
In part 2 of the series, I explained some organizational challenges when growing a web localization team healthily. In this post, I want to outline some of issues localization tools are facing on the web.
Note: In this article, the term localization tools is referring to developer tools that human beings can use to localize web applications. This is unrelated to automatic translation tools like Google Translate or Yahoo Babel Fish.
The thin line between developer and localizer
Most translatable Web content at Mozilla is represented as GNU gettext .po files as part of the regular the source code. Localizing web applications is therefore still very much a technical task: Checking out a source tree, editing a .po file, checking it back in, (optionally waiting for the staging server to pick up the changes and checking out one’s changes online), that sounds very much like a developer’s work, not as much like the core concern of a localizer.
A lot of tools do not try to change that. They are mere frontends to translating .po files, but they do not help you translate the website. One way to change this would be building a “PO live edit” tool, as my colleague Austin King describes in a recent blog post.
Please read his post for more details, but the idea is to build a translation tool that requires zero set-up, has a small learning curve, and ties web site translation together “end-to-end”, i.e., from development to deployment in one go.
.po or not .po?
Another problem we have is that not all of our pages are completely localizable in .po files only. Most notably, the web site mozilla.com mainly consists of straight HTML files, tied together with some PHP. But also other pages, like AMO, have some content in localizable HTML files. The rationale is that these texts would add a huge number of strings to the .po files if they were split up in that fashion, and it would be significantly harder for localizers to translate these out of context. Therefore, having localizers translate straight HTML from a file is considered the “lesser evil”.
The reason why these are so different is that these pages sometimes do not only contain translatable content, but they may also require the localizer to add or remove bullet points, or replace links to a US-centered site with another locally applicable option.
Currently, we are not aware of a tool that makes these localizations easier. Some tools, like html2po, try to extract strings from an HTML file and drop them into a .po file (one string per block-level element, for example), with varying success. While this allows using .po-capable tools on the resulting files, it removes the flexibility to perform “content-changing localization” as described above. Finally, it also makes the .po file authoritative, i.e., if one localizer edits the generated .po file but another one prefers editing the HTML template, we are going to run into non-trivial merging problems.
Making HTML translation easier is definitely a “slippery slope” towards building a full-blown HTML editor, which by all means we want to avoid. However, I can imagine integrating Bespin with a web site instance and allowing localizers to collaboratively edit the HTML file right where it will show up eventually.
This is where you come in: Please leave comments. If you are a localizer yourself, please tell us what does or what does not work well for you. What is your workflow like? Are you happy with it or would you like to change it? If you have used or witnessed a tool before that you found provided an innovative solution to the problems outlined above, please tell us.
This series of blog posts was a result of many discussions with sethb, gandalf, stas, ozten, pascal, pike and others interested in making Web localization better. Thanks for your input, guys!
Photo Credit: “Old Tools”, CC by-sa licensed by Svadilfari on flickr.