L20n features explained. DOM overlays.

(This is a crosspost from my blog: http://informationisart.com/8/. Check it out for better code formatting and syntax highlighting.)

With L20n’s DOM overlays, developers can amend localized strings with additional non-localizable HTML markup. This improves the separation of content and structure and reduces the cruft in localization files.

When it comes to localizing web content, you’re likely to end up with a lot of HTML inside of your source strings. You might be see markup like <strong> and <em>, or simply links to other resources with <a> tags.

Consider the following paragraph taken from www.mozilla.org.

Portions of this content are ©1998–2012 by individual mozilla.org contributors. Content available under a Creative Commons license.

The HTML code for this paragraph is this:

 <p> Portions of this content are ©1998–2012 by individual mozilla.org contributors. Content available under a <a href="/foundation/licensing/website-content.html">Creative Commons license</a>. </p>

You’ll notice the <a> tag with an href attribute. The href is a URL, and it makes this HTML significantly harder to read.

If we wanted to localize this paragraph, the L20n code for English would look like this:

 <licenseInfo """  Portions of this content are ©1998–2012 by individual   mozilla.org contributors. Content available under a   <a href="/foundation/licensing/website-content.html">Creative   Commons license</a>.  """>

The URL will always be /foundation/licensing/website-content.html, regardless of the user’s locale. It makes little sense, then, to have it in the source string. The tag makes the string harder to read and increases the risk of introducing an error (e.g., removing a quotation mark or accidentally editing the URL).

In fact, the href attribute is part of the document’s structure rather than its source content, and as such, does not belong in the L20n code at all.

What if L20n let you skip attributes that are not related to the source content? What if it copied those attributes from the developer-defined code, thus sparing the localizer all the trouble?

Enter DOM overlays

The premise is simple: only localizable source content should live in L20n files. Let’s modify the HTML code and the licenseInfo string accordingly.

 <p l10n-id="licenseInfo"> <a href="/foundation/licensing/website-content.html"></a> </p>

The actual content, both source & target, will be injected with L20n code. This way the developer doesn’t have to (although they can) put it in HTML. All that matters is the l10n-id="licenseInfo" part, as well as the a tag with the href attribute defined in HTML. All the rest happens in the L20n file.

 <licenseInfo """  Portions of this content are ©1998–2012 by individual   mozilla.org contributors. Content available under a   <a>Creative Commons license</a>.  """>

We keep the <a> tag in the L20n file so that the localizer has control over what is linked and what is not. However, the attributes of the tag are not localizable and thus, are absent in the string. The strings is easier to read, and also harder to accidentally break.

Matching and reordering multiple overlays

L20n’s DOM overlays match HTML nodes by type, name and position. If licenseInfo had two <a> child nodes, their attributes would be matched and copied from the source string in their respective order.

Consider the following example.

Welcome to Pancake, Staś.

The HTML and L20n code responsible for this message might look like this:

 <p l10n-id="welcome"> <a href="/"></a> <a href="/profile"></a> </p>

 <welcome """  Wecome to <a>{{ brandName }}</a>, <a>{{ $user.firstname }}</a>.  """>

The <a> elements are in the same order in the source code and in the L20n code. L20n will thus copy the href attribute from the first <a> element in the source code to the first <a> element in the L20n code.

Let’s suppose now that the localizer wishes to change the order of the links. Maybe the grammar requires her to do so, or maybe the register is more (or less) formal in her locale. The expected result would be (translated back to English for the sake of this example) this:

Hi Staś. Welcome to Pancake.

Because the order is different, the localizer cannot rely on L20n’s automatching any more. Instead, she needs to instruct L20n which <a> element corresponds to which one in the source. L20n allows her to do so via the l10n-path attribute set on the element, like so:

 <welcome """  Hi <a l10n-path="a[2]">{{ $user.firstname }}</a>.  Welcome to <a l10n-path="a[1]">{{ brandName }}</a>.  """>

Using the XPath syntax, the localizer identifies which source a element to copy attributes from. The first <a> element in the translation will be matched against the second <a> element in the source. The meaning of a[2] in XPath is:

the second child (descendant of the first generation) of the context node that is an <a> element.

(The context node is the source node that’s being localized, in this example the <p> element with l10n-id="welcome".)

In most of the cases, the XPath expression will be very basic and minimal, like in the examples above. The full XPath syntax is supported, however, allowing for more complex matching.

Lastly, the l10n-path is only required when changing order of elements of the same type, like two a elements. If you want to change the order of child nodes in a string with one <strong> and one <em> tag, you can do so without having to specify the l10n-path attributes.

Privileges and autoextraction

The next step is to see if there’s a need to prevent some attributes from being copied. It might be interesting to extend DOM overlays with a mechanism which only accepts whitelisted attributes, or blocks blacklisted ones from being copied from the source strings to the translation. This could be done globally, or even per-entity.

I also started working on a maintenance script which extracts the contents of source nodes and automatically creates valid L20n code ready to be localized. It supports whitelisting attributes, but generally leaves most of the attributes out of the L20n code. You can find the code on Github, but bear in mind that this was more of an experiment and is very much a work-in-progress.

Discussion

Please post your thoughts in the mozilla.dev.l10n newsgroup.

Mozilla L10N

L20n features explained. DOM overlays.

Enter DOM overlays

Matching and reordering multiple overlays

Privileges and autoextraction

Discussion

No comments yet

Leave a Reply
Cancel reply

A Deep Dive Into the Evolution of Pretranslation in Pontoon

L10n Report: February 2024 Edition

Advancing Mozilla’s mission through our work on localization standards

Mozilla Localization in 2023

2024 Pontoon survey results

Vote for new Pontoon features

Localization Hackathon in the Czech Republic

L10n Report: February 2024 Edition

L10n Report: November 2023 Edition

L10n Report: July 2023 Edition

L10n Report: May 2023 Edition

L10n Report: January 2023 Edition

A Deep Dive Into the Evolution of Pretranslation in Pontoon

2024 Pontoon survey results

Vote for new Pontoon features

Introducing Pretranslation in Pontoon

Announcing top-voted features and an opportunity to vote on more

A Deep Dive Into the Evolution of Pretranslation in Pontoon

Advancing Mozilla’s mission through our work on localization standards

Mozilla Localization in 2023

Q3 2019 objectives for Mozilla localization

Changing the Language of Firefox Directly From the Browser

L10n community events in 2018

Localization Workshop in Kolkata (November 2017)

Fluent in Gecko, new Pontoon UI, centralized terminology resources, and community workshops in 2018

Paris Localization Workshop

Taipei Localization Workshop

Localizer Spotlight: Meet Reza (Persian locale)

Localizer Spotlight: Victor Ibragimov (Tajik locale)

Localization Workshop in Kolkata (November 2017)

In Memoriam: Mamadou Niang, Fulah localizer

Taipei Localization Workshop

Enter DOM overlays

Matching and reordering multiple overlays

Privileges and autoextraction

Discussion

Leave a ReplyCancel reply

Leave a Reply
Cancel reply