{"id":599,"date":"2018-07-12T06:25:03","date_gmt":"2018-07-12T13:25:03","guid":{"rendered":"http:\/\/blog.mozilla.org\/axel\/?p=599"},"modified":"2018-07-12T06:25:03","modified_gmt":"2018-07-12T13:25:03","slug":"localization-translation-and-machines","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/axel\/2018\/07\/12\/localization-translation-and-machines\/","title":{"rendered":"Localization, Translation, and Machines"},"content":{"rendered":"<p>TL;DR: Is there research bringing together Software Analysis and Machine Translation to yield Machine Localization of Software?<\/p>\n<blockquote cite=\"https:\/\/brendycaldwell.com\/2013\/03\/19\/on-craic-im-telling-you-there-is-no-word-for-yes-or-no-in-irish\/\"><p>I\u2019m Telling You, There Is No Word For \u2018Yes\u2019 Or \u2018No\u2019 In Irish<\/p><\/blockquote>\n<p><cite>from <a href=\"https:\/\/brendycaldwell.com\/2013\/03\/19\/on-craic-im-telling-you-there-is-no-word-for-yes-or-no-in-irish\/\">Brendan Caldwell<\/a><\/cite><\/p>\n<p>The art of localizing a piece of software with a <em>Yes<\/em> button is to know what that button will do. This is an example of software UI that makes assumptions on language that hold for English, but might not for other languages. A more frequent example in both UI and languages that are affecting is piecing together text and UI controls:<\/p>\n<p><a href=\"https:\/\/blog.mozilla.org\/axel\/files\/2018\/07\/Pieced-together-UI.png\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.mozilla.org\/axel\/files\/2018\/07\/Pieced-together-UI-300x94.png\" alt=\"\" width=\"300\" height=\"94\" class=\"aligncenter size-medium wp-image-600\" srcset=\"https:\/\/blog.mozilla.org\/axel\/files\/2018\/07\/Pieced-together-UI-300x94.png 300w, https:\/\/blog.mozilla.org\/axel\/files\/2018\/07\/Pieced-together-UI-768x240.png 768w, https:\/\/blog.mozilla.org\/axel\/files\/2018\/07\/Pieced-together-UI.png 984w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>In the localization tool, you&#8217;ll find each of those entries as individual strings. The localizer will recognize that they&#8217;re part of one flow, and will move fragments from the shared string to the drop-down as they need. Merely translating the individual segments is not going to be a proper localization of that piece of UI.<\/p>\n<p>If we were to build a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Rule-based_machine_translation\">rule-based<\/a> machine localization system, we&#8217;d find rules like<\/p>\n<ul>\n<li><code>gaelic-yes<\/code>:<br \/>\nIf the title of your dialog contains a verb, localize <em>Yes<\/em> by translating the found verb.<\/p>\n<li><code>pieced-ui<\/code>:<br \/>\nFor each variant,<\/p>\n<ul>\n<li>Piece together the fragments of English to a single sentence\n<li>Translate the sentences into the target language\n<li>Find shared content in matching positions to the original layout\n<li>Split each translated fragment, and adjust the casing and spacing\n<li>Map the subfragments to the localization of the English individual fragments\n<\/ul>\n<p>Map the shared fragment to the localization of the English shared fragment\n<\/ul>\n<p>Now that&#8217;s rule-based, and it&#8217;d be tedious to maintain these rules. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Neural_machine_translation\">Neural Machine Translation<\/a> (NMT) has all the buzz now, and Machine Learning in general. There is <a href=\"https:\/\/slator.com\/academia\/here-machine-translation-researchers-are-geeking-out-on\/\">plenty of research<\/a> that improves how NMT systems learn about the context of the sentence they&#8217;re translating. But that&#8217;s all text.<\/p>\n<p>It&#8217;d be awesome if we could bring Software Analysis into the mix, and train NMT to localize software instead of translating fragments.<\/p>\n<p>For Firefox, could one train on English and localized DOM? For Android&#8217;s XML layout, a similar approach could work? For projects with automated screenshots, could one train on those? Is there enough software out there to successfully train a neural network?<\/p>\n<p>Do you know of existing research in this direction?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>TL;DR: Is there research bringing together Software Analysis and Machine Translation to yield Machine Localization of Software? I\u2019m Telling You, There Is No Word For \u2018Yes\u2019 Or \u2018No\u2019 In Irish from Brendan Caldwell The art of localizing a piece of software with a Yes button is to know what that button will do. This is [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,5],"tags":[23779,23778],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/posts\/599"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/comments?post=599"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/posts\/599\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/media?parent=599"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/categories?post=599"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/axel\/wp-json\/wp\/v2\/tags?post=599"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}