Why l10n tools should be editors instead of serializers

If your tool serializes internal state instead of editing files, it’ll do surprising things if it encounters surprising content. Like, turn

errNotSemicolonTerminated=Named character reference was not terminated by a semicolon. (Or “&” should have been escaped as “&”.)

into

errNotSemicolonTerminated=Named character reference was not terminated by a semicolon. (Or “” should have been escaped as “amp;”.)

And that’s for a string the localizer never touched.

(likely narro issue 316)

The Conversation {2 comments}

  1. Ian Thomas {Friday July 20, 2012 @ 10:32 am}

    Your blog software should be more careful with it’s encoding/escaping too! I assume that’s meant to read “(Or [ampersand character] should have been escaped as [ampersand html entity])”

    Surely that’s just a bug in the tool though, which appears to be attempting to handle html entities and failing (a typical encoding/escaping/injection mistake). I don’t see that asking humans to edit files directly would make it any better.

    There should probably be a unit test in the software to read unserialize and reserialize a complicated file to make sure it’s not making any unintentional changes.

  2. Axel Hecht {Friday July 20, 2012 @ 10:37 am}

    Yes, you’re right, post updated.

    Regarding tests, you can only test things you know, and with the size of the Firefox community, you only know a small fragment of the things that coders do, just because it works in en-US.

    The culprit is the handling of ‘&’ as notifier for accesskeys, which is an old evil that some poor soul introduced ages ago. No idea why he thought that’d be a good idea.

Sorry, comments for this entry are closed at this time.