This is the last part of a three part series. (Part 1) (Part 2)
The basic premise of our strategy for dynamic localization was to replace actual strings in the database with ints, which were foreign keys into a `translations` table that held the actual strings. The `translations` table looks like:
+------------------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------------+------------------+------+-----+---------+-------+ | id | int(11) unsigned | | PRI | 0 | | | locale | varchar(10) | | PRI | | | | localized_string | text | YES | | NULL | | | created | datetime | YES | | NULL | | | modified | datetime | YES | | NULL | | +------------------+------------------+------+-----+---------+-------+
When combined, our id and locale create the primary key which will look up a textual value in the localized_string field. This method has the advantage of utilizing foreign keys (one weakness of the PEAR::Translation2 method). The price we pay is complexity – if we want to look at any information in the database, there will be joins, and plenty of them.
Striding forward with our plan, we hit a roadblock with the way CakePHP handles associations. Specifically, CakePHP doesn’t allow us to associate models on arbitrary columns. For example, a hasMany relationship allows you to choose the “foreignKey” for one table, but the other is assumed to be the primary key. This means we can’t just setup relationships in the model and forget about them. Using CakePHP’s relationship model still seemed like the right thing to do, even if it wouldn’t be entirely automatic.
Firstly, to identify which fields in each model would be translated, we added another array to each model called $translated_fields.
Next, we had to build a relationship between whichever model we were using and the Translation model. Since we had to use custom SQL to retrieve the translations, we had to build this relationship on the fly for every object that had translations. Using app_model’s beforeFind() made this, relatively, easy. The code we used to build the relationship is fairly long, but it’s straightforward and well commented. The more notable part is the freakishly large and complex queries that come out of it. For example, this is an actual query that is run to retrieve information for an addon (forgive the strange line breaks – I’m trying to fit it in the wordpress column):
SELECT `Addon`.`id`, `Addon`.`guid`, IFNULL(`tr_name`.localized_string, `fb_name`.localized_string) AS `name`, `Addon`.`defaultlocale`, `Addon`.`addontype_id`, `Addon`.`status`, `Addon`.`icontype`, IFNULL(`tr_homepage`.localized_string, `fb_homepage`.localized_string) AS `homepage`, IFNULL(`tr_description`.localized_string, `fb_description`.localized_string) AS `description`, IFNULL(`tr_summary`.localized_string, `fb_summary`.localized_string) AS `summary`, `Addon`.`averagerating`, `Addon`.`weeklydownloads`, `Addon`.`totaldownloads`, IFNULL(`tr_developercomments`.localized_string, `fb_developercomments`.localized_string) AS `developercomments`, `Addon`.`inactive`, `Addon`.`trusted`, `Addon`.`viewsource`, `Addon`.`prerelease`, `Addon`.`adminreview`, `Addon`.`sitespecific`, `Addon`.`externalsoftware`, IFNULL(`tr_eula`.localized_string, `fb_eula`.localized_string) AS `eula`, IFNULL(`tr_privacypolicy`.localized_string, `fb_privacypolicy`.localized_string) AS `privacypolicy`, `Addon`.`created`, `Addon`.`modified`, IF(!ISNULL(`tr_description`.localized_string), `tr_description`.locale, `fb_description`.locale) AS `description_locale`, IF(!ISNULL(`tr_developercomments`.localized_string), `tr_developercomments`.locale, `fb_developercomments`.locale) AS `developercomments_locale`, IF(!ISNULL(`tr_eula`.localized_string), `tr_eula`.locale, `fb_eula`.locale) AS `eula_locale`, IF(!ISNULL(`tr_homepage`.localized_string), `tr_homepage`.locale, `fb_homepage`.locale) AS `homepage_locale`, IF(!ISNULL(`tr_name`.localized_string), `tr_name`.locale, `fb_name`.locale) AS `name_locale`, IF(!ISNULL(`tr_privacypolicy`.localized_string), `tr_privacypolicy`.locale, `fb_privacypolicy`.locale) AS `privacypolicy_locale`, IF(!ISNULL(`tr_summary`.localized_string), `tr_summary`.locale, `fb_summary`.locale) AS `summary_locale` FROM `addons` AS `Addon` LEFT JOIN translations AS `tr_description` ON ( `Addon`.`description` = `tr_description`.id AND `tr_description`.locale='en-US' ) LEFT JOIN translations AS `fb_description` ON (`Addon`.`description` = `fb_description`.id AND `fb_description`.locale=`Addon`.`defaultlocale`) LEFT JOIN translations AS `tr_developercomments` ON (`Addon`.`developercomments` = `tr_developercomments`.id AND `tr_developercomments`.locale='en-US') LEFT JOIN translations AS `fb_developercomments` ON (`Addon`.`developercomments` = `fb_developercomments`.id AND `fb_developercomments`.locale=`Addon`.`defaultlocale`) LEFT JOIN translations AS `tr_eula` ON (`Addon`.`eula` = `tr_eula`.id AND `tr_eula`.locale='en-US') LEFT JOIN translations AS `fb_eula` ON (`Addon`.`eula` = `fb_eula`.id AND `fb_eula`.locale=`Addon`.`defaultlocale`) LEFT JOIN translations AS `tr_homepage` ON (`Addon`.`homepage` = `tr_homepage`.id AND `tr_homepage`.locale='en-US') LEFT JOIN translations AS `fb_homepage` ON (`Addon`.`homepage` = `fb_homepage`.id AND `fb_homepage`.locale=`Addon`.`defaultlocale`) LEFT JOIN translations AS `tr_name` ON (`Addon`.`name` = `tr_name`.id AND `tr_name`.locale='en-US') LEFT JOIN translations AS `fb_name` ON (`Addon`.`name` = `fb_name`.id AND `fb_name`.locale=`Addon`.`defaultlocale`) LEFT JOIN translations AS `tr_privacypolicy` ON (`Addon`.`privacypolicy` = `tr_privacypolicy`.id AND `tr_privacypolicy`.locale='en-US') LEFT JOIN translations AS `fb_privacypolicy` ON (`Addon`.`privacypolicy` = `fb_privacypolicy`.id AND `fb_privacypolicy`.locale=`Addon`.`defaultlocale`) LEFT JOIN translations AS `tr_summary` ON (`Addon`.`summary` = `tr_summary`.id AND `tr_summary`.locale='en-US') LEFT JOIN translations AS `fb_summary` ON (`Addon`.`summary` = `fb_summary`.id AND `fb_summary`.locale=`Addon`.`defaultlocale`) LEFT JOIN `addontypes` AS `Addontype` ON `Addon`.`addontype_id` = `Addontype`.`id` WHERE (`Addon`.`id` = 9) AND (`Addon`.`inactive` = 0) AND (`Addon`.`addontype_id` IN ('1', '2') ) LIMIT 1
That query went through several revisions, all of which had their pros and cons. Despite it’s intimidating length, the structure has a purpose. A few notes:
- You can see which columns have translations in the SELECT pretty easily by looking for the IFNULL() function
- The left outer joins allow us to get nulls back where the string doesn’t exist. Our initial query used inner joins, and when a string didn’t exist in the database, we’d get nothing back.
- The union merges two nearly identical queries – one for the localized content, and one for the English fall back content.
- The outer “SELECT AS” statement exists so the whole query plays nicely with cake. CakePHP’s dbo implementation assumes the model name will be the index into the array where it will store data. If that name isn’t set, it will automatically pluralize the name, which eventually causes a fatal error.
In our previous revisions of the query, we had to use a complimentary afterFind() in the Translation model to organize the strings into a structure that was more usable by the views. It automatically looked if a string in the language was null, and if so, replaced it with the English version. The query above makes this unnecessary, as that workload is passed off to MySql. With either method, the end result is an array filled with dynamically translated strings, and their locales (in case we had to fall back, the view needs to know the string is in a different language).
Our method has several shortcomings, some specific to our code, and others that are a problem with most (all?) localization efforts. We’ll probably address these in the future, but in the mean time, some trouble spots are:
- Recursion. At this point when we do a query, beforeFind() only runs on the first model. Someone already filed cakephp’s Ticket 1183 on the issue. We’ve had to do a couple extra queries on some complex pages in order to get all the strings we need.
- We currently only support one fallback language. It would be nice to have multiple language fallback. For example, from zh-TW -> zh-CN -> en-US. This would be easiest to implement by just looking for another language (when appropriate) with each query, but I’d like to come up with a more efficient way (in terms of query length and complexity).
- Declinations, capitalization differences, etc. – You’ll find these in any localization project, but it’s worth mentioning.
Since it’s been a while since the previous posts, a quick summary is probably in order. At this point, we’ve successfully combined localization for static and dynamic content, using systems that are fast and robust. Things are more complex because of it, but that’s par for the course when it comes to localization. What we’ve done here is an exciting step forward for enabling people to share their firefox extensions with people around the world.
Looking back over these 3 posts, I see most of the posts have been high level ideas, and less about the actual integration into the CakePHP code. If there is sufficient interest, I can make another post describing:
- How to detect/handle the language in the URL
- How to initialize/remap languages
- More on integrating the dynamic parts of the localization strategy into CakePHP (probably focusing on the view).
- Anything else you’re curious about
If you’re less interested in reading my ramblings, and more interested in code, let me point out the SVN repository one last time.
Anders wrote on :
Frédéric Wenzel wrote on :
Ugo PARSI wrote on :
Rostislav Palivoda wrote on :