Transforming Translations: How LLMs Can Help Improve Mozilla’s Pontoon

Image generated by DALL-E 3

Imagine a world where language barriers do not exist; a tool so intuitive that it can understand the subtleties of every dialect and the jargon of any industry.

While we’re not quite there yet, advancements in Large Language Models (LLMs) are bringing us closer to this vision.

What are LLMs: Beyond the Buzz

2024 is buzzing with talk about “AI,” but what does it actually mean? Artificial Intelligence, especially LLMs, isn’t just a fad — it’s a fundamental shift in how we interface with technology. You’ve likely interacted with AI without even realizing it — when Google auto-completes your searches, when Facebook suggests who to tag in a photo, or when Netflix recommends what you should watch next.

LLMs are a breed of AI designed to understand and generate human language by analyzing vast amounts of text. They can compose poetry, draft legal agreements, and yes, translate languages. They’re not just processing language; they’re understanding context, tone, and even the subtext of what’s being written or said.

The Evolution of Translation: From Machine Translation to LLMs

Remember the early days of Google Translate? You’d input a phrase in English and get a somewhat awkward French equivalent. This was typical of statistical machine translation, which relied on vast amounts of bilingual text to make educated guesses. It was magic for its time, but it was just the beginning.

As technology advanced, we saw the rise of neural machine translation, which used AI to better understand context and nuance, resulting in more accurate translations. However, even these neural models have their limitations.

Enter LLMs, which look at the big picture, compare multiple interpretations, and can even consider cultural nuances before suggesting a translation.

Pontoon: The Heart of Mozilla’s Localization Efforts

Pontoon isn’t just any translation tool; it’s the backbone of Mozilla’s localization efforts, where a vibrant community of localizers breathes life into strings of text, adapting Mozilla’s products for global audiences. However, despite integrating various machine translation sources, these tools often struggle with capturing the subtleties essential for accurate translation.

How do we make localizers’ jobs easier? By integrating LLMs to assist not just in translating text but in understanding the spirit of what’s being conveyed. And crucially, this integration doesn’t replace our experienced localizers who supervise and refine these translations; it supports and enhances their invaluable work.

Leveraging Research: Making the Case for LLMs

Our journey began with a question: How can we enhance Pontoon with the latest AI technologies? Diving into research, we explored various LLM applications, from simplifying complex translation tasks to handling under-represented languages with grace.

To summarize the research:

Performance in Translation: Studies like “Large Language Models Are State-of-the-Art Evaluators of Translation Quality” by Tom Kocmi and Christian Federmann demonstrated that LLMs, specifically GPT-3.5 and larger models, exhibit state-of-the-art capabilities in translation quality assessment. These models outperform other automatic metrics in quality estimation without a reference translation, especially at the system level.
Robustness and Versatility: The paper “How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation” by Amr Hendy et al. highlighted the competitive performance of GPT models in translating high-resource languages. It also discussed the limited capabilities for low-resource languages and the benefits of hybrid approaches that combine GPT models with other translation systems.
Innovative Approaches: Research on new trends in machine translation, such as “New Trends in Machine Translation using Large Language Models: Case Examples with ChatGPT” explored innovative directions like stylized and interactive machine translation. These approaches allow for translations that match specific styles or genres and enable user participation in the translation process, enhancing accuracy and fluency.

The findings were clear — LLMs present a significant opportunity to enhance Pontoon and improve translation quality.

Why We Chose This Path

Why go through this transformation? Because language is personal. Take the phrase “Firefox has your back.” In English, it conveys reliability and trust. A direct translation might miss this idiomatic expression, interpreting it literally as “someone has ownership of your back”, which could confuse or mislead users. LLMs can help maintain the intended meaning and nuance, ensuring that every translated phrase feels as though it was originally crafted in the user’s native language.

We can utilize the in-context learning of LLMs to help with this. This is a technique that informs the model about your data and preferences as it generates its responses via an engineered prompt.

Experimenting: A Case Study with ChatGPT and GPT-4

To illustrate the effectiveness of our approach, I conducted a practical experiment with OpenAI’s ChatGPT, powered by GPT-4. I asked ChatGPT to translate the string “Firefox has your back” to Bengali. The initial translation roughly translates to “Firefox is behind you”, which doesn’t convey the original meaning of the string.

Screenshot of first interaction with ChatGPT

Asking GPT-4 to translate the string “Firefox has your back” to Bengali.

Now, it seems our friendly ChatGPT decided to go rogue and translated “Firefox” despite being told not to! Additionally, instead of simply providing the translation as requested, it gave a verbose introduction and even threw in an English pronunciation guide. This little mishap underscores a crucial point: the quality of the output heavily depends on how well the input is framed. It appears the AI got a bit too eager and forgot its instructions.

This experiment shows that even advanced models like GPT-4 can stumble if the prompt isn’t just right. We’ll dive deeper into the art and science of prompt engineering later, exploring how to fine-tune prompts to guide the model towards more accurate and contextually appropriate translations.

Next, I asked ChatGPT to translate the same string to Bengali, this time I specified to keep the original meaning of the string.

Screenshot of second interaction with ChatGPT

Asking GPT-4 to translate the string “Firefox has your back” to Bengali, while maintaining the original meaning of the string.

Adjusting the prompt, the translation evolved to “Firefox is with you”—a version that better captured the essence of the phrase.

I then used Google Translate to translate the same string.

Using Google Translate to translate the string “Firefox has your back” to Bengali.

For comparison, Google Translate offered a similar translation to the first attempt by GPT-4, which roughly translates to “Firefox is behind you”. This highlights the typical challenges faced by conventional machine translation tools.

This experiment underscores the potential of stylized machine translation to enhance translation quality, especially for idiomatic expressions or specific styles like formal or informal language.

The Essential Role of Prompt Engineering in AI Translation

Building on these insights, we dove deeper into the art of prompt engineering, a critical aspect of working with LLMs. This process involves crafting inputs that precisely guide the AI to generate accurate and context-aware outputs. Effective prompt engineering enhances the accuracy of translations, streamlines the translation process by reducing the need for revisions, and allows for customization to meet specific cultural and stylistic preferences.

Working together with the localization team, we tested a variety of prompts in languages like Italian, Slovenian, Japanese, Chinese, and French. We assessed each translation on its clarity and accuracy, categorizing them as unusable, understandable, or good. After several iterations, we refined our prompts to ensure they consistently delivered high-quality results, preparing them for integration into Pontoon’s Machinery tab.

How It Works: Bringing LLMs to Pontoon

Above is a demonstration of using the “Rephrase” option on the string “Firefox has your back” for the Italian locale. The original suggestion from Google’s Machine Translation meant “Firefox covers your shoulders”, while the rephrased version means “Firefox protects you”.

After working on the prompt engineering and implementation, we’re excited to announce the integration of LLM-assisted translations into Pontoon. For all locales utilizing Google Translate as a translation source, a new AI-powered option is now available within the ‘Machinery’ tab — the reason for limiting the feature to these locales is to gather insights on usage patterns before considering broader integration. Opening this dropdown will reveal three options:

REPHRASE: Generate an alternative to this translation.

MAKE FORMAL: Generate a more formal version of this translation.

MAKE INFORMAL: Generate a more informal version of this translation.

After selecting an option, the revised translation will replace the original suggestion. Once a new translation is generated, another option SHOW ORIGINAL will be available in the dropdown menu. Selecting it will revert to the original suggestion.

The Future of Translation is Here

As we continue to integrate Large Language Models (LLMs) into Mozilla’s Pontoon, we’re not just transforming our translation processes — we’re redefining how linguistic barriers are overcome globally. By enhancing translation accuracy, maintaining cultural relevance, and capturing the nuances of language through the use of LLMs, we’re excited about the possibilities this opens up for users worldwide.

However, it’s important to emphasize that the role of our dedicated community of localizers remains central to this process. LLMs and machine translation tools are not used without the supervision and expertise of experienced localizers. These tools are designed to support, not replace, the critical work of our localizers who ensure that translations are accurate and culturally appropriate.

We are eager to hear your thoughts. How do you see this impacting your experience with Mozilla’s products? Do the translations meet your expectations for accuracy? Your feedback is invaluable as we strive to refine and perfect this technology. Please share your thoughts and experiences in the comments below or reach out to us on Matrix, or file an issue. Together, we can make the web a place without language barriers.