Moving the needle on translation quality

Identifying translation issues is hard

Trying to measure translation quality is like asking someone to tell you if a cake they’ve made tastes good. It may taste very good to you, but it may taste terrible to your friend. Despite the fact that the cake’s recipe might have been followed very closely, each of us has our own unique set of criteria of what makes a “good” cake. If someone were to ask you to describe why the cake was or was not good, you may struggle for the right words. It often comes down to a gut feeling that determines whether or not it’s good.

When you’re asked to evaluate a translation into your native language and describe whether the translation is good, you might find yourself struggling for words, leaving you to simply say, “The translation just doesn’t sound/feel right.” While this may be true, it doesn’t describe the issue with the translation or what needs to be corrected to make it better. Often we simply lack the right words to identify the translation issue.

MQM describes issue types

The Multidimensional Quality Metric (MQM) standard provides a framework for identifying and categorizing translation issues. Through standardized issue classification, reviewers are given the “right words” to describe issues they see in a translation they’re reviewing. Using this standard terminology for translation issues throughout the Mozilla l10n communities, reviewers are able to accurately identify issues within a l10n community. Localizers are then more easily able to triage the issue and either determine (according to the locale’s style guide) that it needs to be fixed and how to fix it, or that the issue is a false positive and intentional. The goal of all of this is to produce a high quality localization by distinguishing between what truly needs to be fixed and what is an intentional method of translation within the community.

At the beginning of 2016, we formed the Translation Quality Team (made up of chofmann, Axel, Peiying, Delphine, and myself) and thoroughly investigated the MQM standard for how to adopt that standard into the l10n workflow during the translation review phase. We started with the hypothesis that we needed to consult with localization communities to define a single, “one-size-fits-all” list of MQM translation issues that could be applied across all locales. This list would have been based on translation issues that each community considered to be either most common, most appropriate for their translation work, or part of the criteria they already used to identify translation issues. We learned a number of lessons while working under this hypothesis:

The first was that there isn’t a “one-size-fits-all” list that we could create.
There is a lot of specialized jargon in the MQM list of issues that needs to be generalized and simplified. We began simplifying the names and definitions of MQM issues types that we felt were jargon.
The MQM standard lists issues within a hierarchy, which makes it easy to separate scoring a translation with a specific “quality grade” from identifying specific issues that need fixing. For example, Spelling issues are part of the Fluency root issue. By marking an issue as a spelling issue, we can identify that action needs to be taken to correct spelling issue as well as identify a score for Fluency based on the number of spelling issues.
Finally, in order for this list of issues to be useful, we first need to have standardized resources against which the translations can be evaluated. Resources like languages-specific Mozilla style guides and termbases (or glossaries) are necessary for anyone to perform an objective translation quality review.

Who reviews translations?

Each Mozilla l10n community consists of both translators (or localizers) and trusted reviewers. These reviewers come from many sources. They range from members of each Mozilla l10n community to end users with technical or linguistic background who occasionally submit l10n bugs. We hope to create a common terminology between localizers and all of these reviewers. This ultimately makes providing feedback more effective by clearly communicating where translation improvements can possibly be made.

We hope that adopting this standard could help us to expand this set of reviewers for each localization. One of the principles behind open source software is that by making code (or strings) publicly transparent, we can increase the number of eyes reviewing the code. This same principle applies for open source localization. Additionally, by creating standard tools, processes, and language for providing feedback, we increase the accuracy of that feedback. With common terms and processes for performing these reviews, we can empower more eyes to review localizations. Who this expanded community of reviewers might be is still unclear, and we welcome your feedback here. By adopting the MQM standard and creating good, language-specific style guides, we hope to improve the communication between reviewers and localizers, resulting in better localized products for end users.

Style guides are a must for l10n quality

With the need for style guides in mind, we created an inventory of all of the existing style guides used in each l10n community. We also noted the issues they warned localizers against committing in their translations. It was great to see that there was a large amount of overlap between what these l10n communities have been using to perform reviews and the MQM standard!

It’s no secret that creating a style guide can be a difficult task. We experimented with a method of bootstrapping a style guide in l10n hackathons using the MQM standard as well as the existing l10n style guides in the l10n community as an inspiration. This was a success for the pt-BR localization community and has inspired the Celtic communities to quickly stand up their own style guides.

This week, the Translation Quality Team met in Utah to create a Mozilla-specific l10n style guide. This style guide is broken up into two main parts: the first contains rules that are language-specific and must be defined by each Mozilla l10n community (covering language-specific style, terminology, grammar, and units); the second contains general rules that we have defined for localizers of all languages that can help you translate well (covering principles of translating with accuracy and fluency). This was a result of us spending hours reviewing a list of 140+ MQM translation issues to identify what applied to Mozilla localization and the overlap in existing l10n community style guides. We feel that this style guide represents the definition of a good Mozilla localization.

For now, this style guide is available on MDN. Please read through it and provide us with feedback. In the near future, we hope to have repositories for Mozilla language resources, like these style guides, for each locale.

Test design is going to require a lot of work

Style guides are one way for us to adopt the MQM standard into the Mozilla l10n workflow. The next is to build tools that make performing translation reviews easy: easy to identify specific issues, easy to give that feedback to a translator, easy to report an issue to correct, and easy to benchmark translation quality for a l10n community’s project. Another very important lesson we learned was about the design around these tools. Essentially, these tools are designed around testing. Test design is a major factor in how we define a translation quality metric of issues. Performing a translation review is primarily a manual task and can be quite challenging and even inaccurate if the design of the task is either:

overwhelming (e.g., asking a localizer to look at 1000 translations for up to 30 issues at once),
too small in scope (e.g., asking a localizer to look at 1000 translations for up to 30 issues, moving through those translations repeatedly looking for one issue at a time),
too long (e.g., ideal length of time for reviewing translations without mental fatigue is no longer than 60 minutes),
filtering through irrelevant issues versus common issue. Common issues would be found in terminology, consistency, mistranslations, intentionally untranslated words, word order, access keys, tone, punctuation, capitalization, verb tense, spelling, and grammar,
too complicated (e.g., the translation review tool for performing the task has a high learning curve),
or too boring.

Additionally, we have to consider the “test math.” This is determining the number of strings to review and what they cover, the number of eyes reviewing those strings, the amount of time that is optimal for good reviews, and the calculated risk factor of issues introduced in longer strings compared to shorter strings. For example, consider the fact that Firefox desktop has about 10,000 strings. Times that by 90 locales and then 40 issue types to evaluate. It would require 170 hours for one person to review all of these strings (assuming one minute to review each string). You’ll see that this has the potential to be a massive effort!

In order to create features or tools that promote good reviews without exhausting reviewers, we have to consider all of these factors. I wish we could say we’ve come up with a solution here, but we have not. Axel has been experimenting with adding this type of review feature to Pontoon, but it is still in the very early stages (see photo).

There are also ways we can create automated testing that looks for specific translation issues, eliminating a lot of manual review work. This actually inspired some of the new features added to Transvision that look at untranslated words and terminology consistency. The Unlocalized Words view allows you to look at the list of untranslated words for a specific locale’s repository. It shows you how often they appear throughout the repo and allows you to sort by frequency. The Terminology Consistency view shows there have been two different translations given for the same string in English. The English string may appear in two places within the repo, but the translation in each place is inconsistent. We will continue to look for ways to automate where possible.

How can I start improving my l10n quality?

This is a complicated task that requires involvement and feedback from a lot of people. If after reading this you say to yourself,”I gotta jump in and help with this!” PLEASE DO! Here are some ways that you can get involved:

For your own locale:

Look at your locale’s list of untranslated words in Transvision’s new Unlocalized Words view. Decide which of those should remain untranslated and add them to a term list. Fix those terms that should be translated.
Look at your locale’s list of inconsistent translations in Transvision’s new Translation Consistency view. Decide if the translations are correctly translated for each context in which they appear in the software. Fix those that need to be consistent.
Get together with your community and try writing a draft of your own style guide following the Mozilla l10n style guide‘s instructions. Tell us if it is too long, too short, filled with too much jargon, or if the call to action in the first section requires too much effort to complete.

For the benefit of the Mozilla l10n program:

Submit patches to Transvision features.
Help test quality-specific features in Transvision, Pontoon, or Pootle and submit bugs when they’re found.
Follow the new bugzilla component Localization Tools and Infrastructure :: Quality and comment on bugs that you feel you can help with.
Join the weekly translation quality meeting on Wednesdays at 17:00 UTC in the Localization Vidyo Room.

Once you have a Mozilla l10n style guide for your language, start performing your own translation reviews within your community. Write blog posts about the experience for other localizers around the world to learn from you.

3 comments on “Moving the needle on translation quality”

Business translation services wrote on September 10, 2016 at 6:53 am:

Wow such a great information, we all need a great interpretation to give our views to others.
Really helpful.

Indiahindiblog.com wrote on February 21, 2017 at 6:32 am:

It was picked up by a very good move. This is very important information for users of Word went by you is given. You can work on the Hindi translation-related. Hindi which will prove very helpful for users.

abdulkaium wrote on March 15, 2017 at 9:34 pm:

best&better

Identifying translation issues is hard

MQM describes issue types

Who reviews translations?

Style guides are a must for l10n quality

Test design is going to require a lot of work

How can I start improving my l10n quality?

Business translation services wrote on September 10, 2016 at 6:53 am:

Indiahindiblog.com wrote on February 21, 2017 at 6:32 am:

abdulkaium wrote on March 15, 2017 at 9:34 pm:

Leave a ReplyCancel reply

Leave a Reply
Cancel reply