{"id":1284,"date":"2018-08-03T23:30:19","date_gmt":"2018-08-03T23:30:19","guid":{"rendered":"http:\/\/blog.mozilla.org\/l10n\/?p=1284"},"modified":"2018-08-06T15:45:21","modified_gmt":"2018-08-06T15:45:21","slug":"intl_pluralrules-a-rust-crate-for-handling-plural-forms-with-cldr-plural-rules","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/l10n\/2018\/08\/03\/intl_pluralrules-a-rust-crate-for-handling-plural-forms-with-cldr-plural-rules\/","title":{"rendered":"intl_pluralrules: A Rust Crate for Handling Plural Forms with CLDR Plural Rules"},"content":{"rendered":"<p><a href=\"https:\/\/crates.io\/crates\/intl_pluralrules\">intl_pluralrules<\/a> is a Rust crate, built to handle <a href=\"https:\/\/en.wikipedia.org\/wiki\/Grammatical_number\">pluralization<\/a>. Pluralization is the foundation for all <a href=\"https:\/\/en.wikipedia.org\/wiki\/Language_localisation\">localization<\/a> and many <a href=\"https:\/\/en.wikipedia.org\/wiki\/Internationalization_and_localization\">internationalization<\/a> APIs. With the addition of intl_pluralrules, any locale-aware date-, time- or unit-formatting (\u201c1 second\u201d vs \u201c2 seconds\u201d) and many other pluralization-dependent APIs can be added to Rust.<\/p>\n<p>Rust joins the family of mature languages such as C++ and Java (via <a href=\"http:\/\/site.icu-project.org\/home\">ICU<\/a>), JavaScript (via <a href=\"https:\/\/tc39.github.io\/ecma402\/\">ECMA 402<\/a>), and Python (via <a href=\"http:\/\/babel.pocoo.org\/\">Babel<\/a>) which make writing multilingual software possible. All of those APIs use the same unified database for storing international plural rules, called <a href=\"http:\/\/cldr.unicode.org\/\">Unicode CLDR<\/a>.<\/p>\n<p>intl_pluralrules determines the CLDR plural category for numeric input by leveraging <a href=\"http:\/\/unicode.org\/reports\/tr35\/tr35-numbers.html#Language_Plural_Rules\">Unicode Language Plural Rules<\/a> and <a href=\"https:\/\/github.com\/unicode-cldr\/cldr-core\/blob\/master\/supplemental\/plurals.json\">plural rules from Unicode CLDR<\/a>. That category can be used to identify the correct string variant that should be used in localization.<\/p>\n<p>The crate is available on <a href=\"https:\/\/crates.io\/crates\/intl_pluralrules\">crates.io<\/a>, and can be used as a library in any Rust program. For example, the <a href=\"https:\/\/github.com\/projectfluent\/fluent-rs\/blob\/master\/fluent\/src\/types.rs#L17-#L72\">Rust implementation<\/a> of the <a href=\"https:\/\/projectfluent.org\/\">Fluent Project<\/a> is the first system using intl_pluralrules for handling pluralization.<\/p>\n<h1>Why care about plurals?<\/h1>\n<p>In short, numbers can change the way words appear in a string. In a simple example, the English words \u201cpage\u201d and \u201cpages\u201d are different because of pluralization. English plural forms are fairly simple; in other languages, the rules can be more complex. If software is built with one plural paradigm in mind, localizing for several languages (each with its own unique paradigm) becomes a complicated process&#8211;and unnecessarily so.<\/p>\n<p style=\"padding-left: 30px;\"><a href=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-2.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1289 size-large\" src=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-2-600x99.png\" alt=\"\" width=\"600\" height=\"99\" srcset=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-2-600x99.png 600w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-2-252x41.png 252w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-2-768x126.png 768w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-2.png 1267w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>Flod\u2019s <a href=\"https:\/\/www.yetanothertechblog.com\/2018\/04\/05\/why-fluent-matters-for-localization\/\">blog post<\/a> on the advantages of Fluent describes the problems and potential benefits intl_pluralrules crate addresses in his section, Plural Forms. If you would like to know more about why crates like intl_pluralrules are vital to i18n and l10n in Rust, start with Flod\u2019s post.<\/p>\n<h1>How intl_pluralrules does plurals<\/h1>\n<p>The intl_pluralrules crate accepts numeric input and produces the appropriate plural category for that input in a given locale. In completing this process, intl_pluralrules performs several steps outlined here:<\/p>\n<h2>Making plural operands from numbers<\/h2>\n<p>Unicode\u2019s Language Plural Rules provides a defining set of characteristics that affect the plural category the number belongs to in certain languages. These characteristics are called <a href=\"http:\/\/unicode.org\/reports\/tr35\/tr35-numbers.html#Operands\">operands<\/a>. The set comprises an absolute value, integer value, fraction value with and without trailing zeros, and the number of fraction digits with and without trailing zeros.<\/p>\n<p>The intl_pluralrules crate creates a set of <a href=\"https:\/\/docs.rs\/intl_pluralrules\/0.9.0\/intl_pluralrules\/operands\/index.html\">operands<\/a> from the input and uses those operands when determining the plural category.<\/p>\n<p><a href=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/07\/operands.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1285 size-large\" src=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/07\/operands-600x341.png\" alt=\"\" width=\"600\" height=\"341\" srcset=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/07\/operands-600x341.png 600w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/07\/operands-252x143.png 252w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/07\/operands-768x436.png 768w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/07\/operands.png 1018w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>Notice the difference between the <b>v<\/b> value (number of visible fraction digits) for 1 and 1.0. Although 1 and 1.0 represent the same literal value (both represent a singular value), the presence of a decimal can change the plural category in some languages.<\/p>\n<p>For example, although in English it is unlikely to see whole count nouns measured in float style (with a trailing zero decimal), the correct plural form for 1.0 is \u201cother\u201d, not \u201cone.\u201d This means that the pages example from the previous section would read, \u201cYou have 1.0 open <b><i>pages<\/i><\/b>\u201d rather than \u201cYou have 1.0 open <b><i>page<\/i><\/b>.\u201d This distinction may seem strange because, as mentioned, this is an unusual use for 1.0 in English. Nonetheless, you will find that it is the proper plural form.<\/p>\n<p>Because Rust\u2019s float types do not preserve trailing zeros when stringified, the <a href=\"https:\/\/docs.rs\/intl_pluralrules\/0.9.0\/intl_pluralrules\/operands\/struct.PluralOperands.html\">from<\/a> method uses Rust&#8217;s <a href=\"https:\/\/doc.rust-lang.org\/nightly\/alloc\/string\/trait.ToString.html\">ToString<\/a> trait on its input when generating plural operands. This allows a user of the crate to send string or numeric input to the system, and, so long as it is a valid float or integer value, it will be accepted.<\/p>\n<h2>CLDR resource parser and code generator<\/h2>\n<p>intl_pluralrules depends on two associated crates that reside in the same <a href=\"https:\/\/github.com\/unclenachoduh\/pluralrules\">GitHub repository<\/a> and are also available on crates.io: <a href=\"https:\/\/crates.io\/crates\/cldr_pluralrules_parser\">cldr_pluralrules_parser<\/a> and <a href=\"https:\/\/crates.io\/crates\/make_pluralrules\">make_pluralrules<\/a>.<\/p>\n<p>cldr_pluralrules_parser parses the plural rules from the<a href=\"https:\/\/github.com\/unicode-cldr\/cldr-json\"> JSON CLDR repository<\/a> and builds an AST representation of the rules. The following code snippet shows the English and Russian plural rules from CLDR.<\/p>\n<p style=\"padding-left: 30px;\"><a href=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-1288 size-large\" src=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-1-600x194.png\" alt=\"\" width=\"600\" height=\"194\" srcset=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-1-600x194.png 600w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-1-252x81.png 252w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-1-768x248.png 768w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-1.png 1600w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>make_pluralrules generates a Rust file from that AST. The following code snippet shows the generated Rust plural logic for English.<\/p>\n<p style=\"padding-left: 30px;\"><a href=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-0.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-medium wp-image-1287\" src=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-0-252x114.png\" alt=\"\" width=\"252\" height=\"114\" srcset=\"https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-0-252x114.png 252w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-0-768x348.png 768w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-0-600x272.png 600w, https:\/\/blog.mozilla.org\/l10n\/files\/2018\/08\/pasted-image-0.png 1360w\" sizes=\"(max-width: 252px) 100vw, 252px\" \/><\/a><\/p>\n<p>The Rust file generated by these crates is used in intl_pluralrules to determine the plural form of a number.<\/p>\n<h2>intl_pluralrules crate<\/h2>\n<p>Using intl_pluralrules is a two-step process.<\/p>\n<ol>\n<li>The user must create an <a href=\"https:\/\/docs.rs\/intl_pluralrules\/0.9.0\/intl_pluralrules\/struct.IntlPluralRules.html\">IntlPluralRule<\/a> instance by providing <a href=\"https:\/\/tools.ietf.org\/html\/bcp47\">a BCP 47 language tag<\/a>* and a <a href=\"https:\/\/docs.rs\/intl_pluralrules\/0.9.0\/intl_pluralrules\/enum.PluralRuleType.html\">plural type<\/a> (cardinal or ordinal) to the <a href=\"https:\/\/docs.rs\/intl_pluralrules\/0.8.2\/intl_pluralrules\/struct.IntlPluralRules.html#method.create\">create method<\/a>. This will use the generated Rust file to create an <a href=\"https:\/\/docs.rs\/intl_pluralrules\/0.8.2\/intl_pluralrules\/struct.IntlPluralRules.html\">IntlPluralRule<\/a> object.<\/li>\n<li>A number needs to be passed to the <a href=\"https:\/\/docs.rs\/intl_pluralrules\/0.8.2\/intl_pluralrules\/struct.IntlPluralRules.html#method.select\">select method<\/a> on the IntlPluralRule instance created in step 1.<\/li>\n<\/ol>\n<p>The value returned from step 2 is the plural category.<\/p>\n<p>*You can use the <a href=\"https:\/\/docs.rs\/intl_pluralrules\/0.9.0\/intl_pluralrules\/struct.IntlPluralRules.html#method.get_locales\">get_locales<\/a> method to see what languages are available in the crate<\/p>\n<h1>Performance and implications<\/h1>\n<p>First, intl_pluralrules has landed in <a href=\"https:\/\/github.com\/projectfluent\/fluent-rs\">fluent-rs<\/a>, meaning that the Rust implementation uses the crate for handling all plural-concerned instances. Because intl_pluralrules leverages the available data from Unicode, Fluent\u2019s selection process for plural-concerned strings in any FTL file is completely automated. So long as the provided CLDR file has rules for your locale, the developer will not need to hard code plural logic into the software and localizers won\u2019t need to report a bug in order for the correct plural string to be activated.<\/p>\n<p>Second, intl_pluralrules is fast. The crate is still in prerelease because, although fully functional, some optimization features are still being discussed. In spite of intl_pluralrules\u2019 WIP status regarding optimization, the system is still incredibly performant. Compared to <a href=\"http:\/\/www.icu-project.org\/apiref\/icu4c\/classicu_1_1PluralRules.html\">ICU&#8217;s C PluralRules<\/a>, intl_pluralrules is approximately 20 times faster in a simple benchmark test.<\/p>\n<p>Intl_pluralrules\u2019 comparative speed is due to the decision to store plural rules as compiled Rust code, rather than as CLDR syntax to be parsed at runtime. Using cldr_pluralrules_parser and make_pluralrules to generate the Rust version of the CLDR rules, the plural rules are compiled into the crate. This makes the crate slightly larger but also quicker because CLDR rules are not parsed at run time (<a href=\"https:\/\/searchfox.org\/mozilla-central\/rev\/e9d2dce0820fa2616174396459498bcb96ecf812\/intl\/icu\/source\/i18n\/plurrule.cpp#444\">as they are in ICU<\/a>), which is the main source of the speed disparity. As intl_pluralrules moves towards 1.0, it is expected that performance will only increase.<\/p>\n<p>In the bigger picture, the release of intl_pluralrules means that the Rust ecosystem gains a higher-level internationalization and localization API, hopefully the first of several. Conversely, the internationalization and localization ecosystem gains use of this API, which leverages the performance benefits of the Rust Language.<\/p>\n<p>Relevant Links:<\/p>\n<ul>\n<li><a href=\"https:\/\/crates.io\/crates\/intl_pluralrules\">intl_pluralrules on Crates.io<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/unclenachoduh\/pluralrules\">GitHub Repo for all three crates<\/a><\/li>\n<li><a href=\"http:\/\/unicode.org\/reports\/tr35\/tr35-numbers.html#Language_Plural_Rules\">Unicode CLDR Plural Rules<\/a><\/li>\n<li><a href=\"https:\/\/projectfluent.org\/\">Project Fluent<\/a><\/li>\n<\/ul>\n<p>intl_pluralrules Developers:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/unclenachoduh\">Kekoa Riggin @unclenachoduh<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/zbraniecki\">Zibi Braniecki @zbraniecki<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>intl_pluralrules is a Rust crate, built to handle pluralization. Pluralization is the foundation for all localization and many internationalization APIs. With the addition of intl_pluralrules, any locale-aware date-, time- or &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/l10n\/2018\/08\/03\/intl_pluralrules-a-rust-crate-for-handling-plural-forms-with-cldr-plural-rules\/\">Read more<\/a><\/p>\n","protected":false},"author":1498,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[286410,610],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/posts\/1284"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/users\/1498"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/comments?post=1284"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/posts\/1284\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/media?parent=1284"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/categories?post=1284"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/l10n\/wp-json\/wp\/v2\/tags?post=1284"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}