Identifying translation issues is hard
Trying to measure translation quality is like asking someone to tell you if a cake they’ve made tastes good. It may taste very good to you, but it may taste terrible to your friend. Despite the fact that the cake’s recipe might have been followed very closely, each of us has our own unique set of criteria of what makes a “good” cake. If someone were to ask you to describe why the cake was or was not good, you may struggle for the right words. It often comes down to a gut feeling that determines whether or not it’s good.
When you’re asked to evaluate a translation into your native language and describe whether the translation is good, you might find yourself struggling for words, leaving you to simply say, “The translation just doesn’t sound/feel right.” While this may be true, it doesn’t describe the issue with the translation or what needs to be corrected to make it better. Often we simply lack the right words to identify the translation issue.
MQM describes issue types
The Multidimensional Quality Metric (MQM) standard provides a framework for identifying and categorizing translation issues. Through standardized issue classification, reviewers are given the “right words” to describe issues they see in a translation they’re reviewing. Using this standard terminology for translation issues throughout the Mozilla l10n communities, reviewers are able to accurately identify issues within a l10n community. Localizers are then more easily able to triage the issue and either determine (according to the locale’s style guide) that it needs to be fixed and how to fix it, or that the issue is a false positive and intentional. The goal of all of this is to produce a high quality localization by distinguishing between what truly needs to be fixed and what is an intentional method of translation within the community.
At the beginning of 2016, we formed the Translation Quality Team (made up of chofmann, Axel, Peiying, Delphine, and myself) and thoroughly investigated the MQM standard for how to adopt that standard into the l10n workflow during the translation review phase. We started with the hypothesis that we needed to consult with localization communities to define a single, “one-size-fits-all” list of MQM translation issues that could be applied across all locales. This list would have been based on translation issues that each community considered to be either most common, most appropriate for their translation work, or part of the criteria they already used to identify translation issues. We learned a number of lessons while working under this hypothesis:
Who reviews translations?
Each Mozilla l10n community consists of both translators (or localizers) and trusted reviewers. These reviewers come from many sources. They range from members of each Mozilla l10n community to end users with technical or linguistic background who occasionally submit l10n bugs. We hope to create a common terminology between localizers and all of these reviewers. This ultimately makes providing feedback more effective by clearly communicating where translation improvements can possibly be made.
We hope that adopting this standard could help us to expand this set of reviewers for each localization. One of the principles behind open source software is that by making code (or strings) publicly transparent, we can increase the number of eyes reviewing the code. This same principle applies for open source localization. Additionally, by creating standard tools, processes, and language for providing feedback, we increase the accuracy of that feedback. With common terms and processes for performing these reviews, we can empower more eyes to review localizations. Who this expanded community of reviewers might be is still unclear, and we welcome your feedback here. By adopting the MQM standard and creating good, language-specific style guides, we hope to improve the communication between reviewers and localizers, resulting in better localized products for end users.
Style guides are a must for l10n quality
With the need for style guides in mind, we created an inventory of all of the existing style guides used in each l10n community. We also noted the issues they warned localizers against committing in their translations. It was great to see that there was a large amount of overlap between what these l10n communities have been using to perform reviews and the MQM standard!
It’s no secret that creating a style guide can be a difficult task. We experimented with a method of bootstrapping a style guide in l10n hackathons using the MQM standard as well as the existing l10n style guides in the l10n community as an inspiration. This was a success for the pt-BR localization community and has inspired the Celtic communities to quickly stand up their own style guides.
This week, the Translation Quality Team met in Utah to create a Mozilla-specific l10n style guide. This style guide is broken up into two main parts: the first contains rules that are language-specific and must be defined by each Mozilla l10n community (covering language-specific style, terminology, grammar, and units); the second contains general rules that we have defined for localizers of all languages that can help you translate well (covering principles of translating with accuracy and fluency). This was a result of us spending hours reviewing a list of 140+ MQM translation issues to identify what applied to Mozilla localization and the overlap in existing l10n community style guides. We feel that this style guide represents the definition of a good Mozilla localization.
For now, this style guide is available on MDN. Please read through it and provide us with feedback. In the near future, we hope to have repositories for Mozilla language resources, like these style guides, for each locale.
Test design is going to require a lot of work
Style guides are one way for us to adopt the MQM standard into the Mozilla l10n workflow. The next is to build tools that make performing translation reviews easy: easy to identify specific issues, easy to give that feedback to a translator, easy to report an issue to correct, and easy to benchmark translation quality for a l10n community’s project. Another very important lesson we learned was about the design around these tools. Essentially, these tools are designed around testing. Test design is a major factor in how we define a translation quality metric of issues. Performing a translation review is primarily a manual task and can be quite challenging and even inaccurate if the design of the task is either:
Additionally, we have to consider the “test math.” This is determining the number of strings to review and what they cover, the number of eyes reviewing those strings, the amount of time that is optimal for good reviews, and the calculated risk factor of issues introduced in longer strings compared to shorter strings. For example, consider the fact that Firefox desktop has about 10,000 strings. Times that by 90 locales and then 40 issue types to evaluate. It would require 170 hours for one person to review all of these strings (assuming one minute to review each string). You’ll see that this has the potential to be a massive effort!
In order to create features or tools that promote good reviews without exhausting reviewers, we have to consider all of these factors. I wish we could say we’ve come up with a solution here, but we have not. Axel has been experimenting with adding this type of review feature to Pontoon, but it is still in the very early stages (see photo).
There are also ways we can create automated testing that looks for specific translation issues, eliminating a lot of manual review work. This actually inspired some of the new features added to Transvision that look at untranslated words and terminology consistency. The Unlocalized Words
How can I start improving my l10n quality?
This is a complicated task that requires involvement and feedback from a lot of people. If after reading this you say to yourself,”I gotta jump in and help with this!” PLEASE DO! Here are some ways that you can get involved:
For your own locale:
- Look at your locale’s list of untranslated words in Transvision’s new Unlocalized Words view. Decide which of those should remain untranslated and add them to a term list. Fix those terms that should be translated.
- Look at your locale’s list of inconsistent translations in Transvision’s new Translation Consistency view. Decide if the translations are correctly translated for each context in which they appear in the software. Fix those that need to be consistent.
- Get together with your community and try writing a draft of your own style guide following the Mozilla l10n style guide‘s instructions. Tell us if it is too long, too short, filled with too much jargon, or if the call to action in the first section requires too much effort to complete.
For the benefit of the Mozilla l10n program:
- Submit patches to Transvision features.
- Help test quality-specific features in Transvision, Pontoon, or Pootle and submit bugs when they’re found.
- Follow the new bugzilla component Localization Tools and Infrastructure :: Quality and comment on bugs that you feel you can help with.
- Join the weekly translation quality meeting on Wednesdays at 17:00 UTC in the Localization Vidyo Room.
Once you have a Mozilla l10n style guide for your language, start performing your own translation reviews within your community. Write blog posts about the experience for other localizers around the world to learn from you.