This is part two of a three part series. (Part 1)
For the static content, we decided to use PHP’s built-in gettext functions. Let’s double check the requirements from part 1:
- Speed: Strings are pulled from a binary file and aggressively cached by apache.
- Robustness: Gettext has been a standard for years and is used in a wide variety of systems reliably.
- Friendliness: Due to its age and widespread use, the .po file that is distributed to localizers is widely recognized and has several applications to assist in the translation.
- Anything else? There are command line programs for creating and merging gettext files already.
Fantastic, it fits the bill and hopefully will be straight forward to implement. One of the first questions that arose was whether we should use actual phrases or placeholder strings in the template files. This would be the difference between, for example, “Welcome to Remora” and “header_welcome” in the template files. Using the actual strings would make translation simpler, but if we wanted to do a minor change to a phrase in english (like adding a comma) we’d have to regenerate the .po files, remerge, and reverify them. If we used placeholder strings, we’d lose the built in gettext fallback of returning the input string when a match can’t be found in the .po file and they wouldn’t be as straight forward for localizers to translate.
After polling some people with expertise in localization, a, surprisingly unanimous, decision to use placeholder strings was agreed on. We’d just have to make sure our translations existed so we didn’t need to depend on gettext’s built in fallback.
A tutorial exists that does a great job covering setup and basic use of gettext already, so I’ll skip the fundamentals that are explained there (but be sure to read it before you continue here!). One aspect worth mentioning in addition to the ONLamp tutorial is supporting plural forms. In English, this usually means adding an ‘s’ – for example, nacho vs. nachos. In other languages (Polish is the classic example), the plural forms get much more complex, often depending on knowing the number of nachos instead of just knowing you have more than one.
Gettext supports multiple plural forms by adding a “Plural-Forms” header in the .po file. This can be a fairly complicated string of ternary operators, that, when evaluated, come up with a resulting number. This result is used as an index into an array in the .po file. That’s a confusing couple of sentences, so let’s have an example. If we were just dealing with English, we could write something like this to handle plural forms:
<?php if ($number == 1) { echo "You have 1 message."; } else { echo "You have {$number} messages."; } ?>
If we were to convert that directly to gettext, we’d be left with two strings to translate, and the only difference being the plural. Lucky for us, gettext supports plurals – unfortunately, it requires an inconvenient change to your code wherever you need to support it. To make the above code gettext/plural friendly, we’ll actually use the ngettext function. Using this function, we can pass the $number variable to gettext so it can determine which array index to return.
<?php // It looks like we have some redundancy here, but that's the way it works - // Since we're using placeholder strings, the first and second parameters are the same. // $number shows up twice because we're passing it to ngettext() and sprintf() echo sprintf(ngettext('header_message_num', 'header_message_num', $number), $number); ?>
The parts of the English .po file that are relevant to this example would look like:
Plural-Forms: nplurals=2; plural=n != 1; msgid "header_message_num" msgid_plural "header_message_num" msgstr[0] "You have 1 message" msgstr[1] "You have %d messages"
The msgid and the msgid_plural correspond to the first and second parameters to ngettext (in our case, they’re equal). The $number variable we passed to gettext is run through the algorithm given in the Plural-Forms header, and results in either a zero or a one – the index to the msgstr array. The gettext manual has a section on plural forms that gives more complex examples, including the algorithms for other languages.
Overall, gettext works as advertised and fulfills our requirements for static localization, save a couple headaches. Firstly, it employs very aggressive caching, and sometimes it can get a little carried away. In fact, we’ve been unsuccessful in finding a way to disable or flush the gettext cache without restarting apache. This is an inconvenience but not a deal killer for us. Hopefully once finished our translations won’t change a lot, but it’s still annoying enough to wonder why there isn’t a more convenient solution.
The second headache is with gettext’s feature set – it doesn’t support declinations. Since, as far as I know, this is a foreign concept in English, let’s look at an example in Spanish. Let’s say we have the following sentence we want to represent in gettext (I’ll skip the placeholder strings for the sake of simplicity):
<?php sprintf(_('Come el %s.'), $fruit); ?>
If you’re familiar with Spanish, you’ll notice that’s a masculine sentence, which would be appropriate if $fruit was “plátano”. However, what if $fruit were “pera”? The sentence would come out as “Come el pera.” when it should say “Come la pera.” As it currently stands, gettext doesn’t support a way to deal with the genders of words. For Remora, we’re going to have to depend on some creative wording to help us avoid situations like the above example.
Despite the shortcomings, I think gettext was the right decision. It has some hiccups that are frustrating, but it’s still the best thing out there. Look forward to another long and complicated post about dynamic localization in the future…
Addendum: After I wrote this post, I saw in the news that CakePHP 1.2 now boasts gettext() functionality in the core – good work to all involved. 🙂
jhermans wrote on :
Wil Clouser wrote on :
Axel Hecht wrote on :