{"id":612,"date":"2013-11-07T18:00:16","date_gmt":"2013-11-07T18:00:16","guid":{"rendered":"http:\/\/blog.mozilla.org\/javascript\/?p=612"},"modified":"2013-11-08T14:54:58","modified_gmt":"2013-11-08T14:54:58","slug":"efficient-float32-arithmetic-in-javascript","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/javascript\/2013\/11\/07\/efficient-float32-arithmetic-in-javascript\/","title":{"rendered":"Efficient float32 arithmetic in JavaScript"},"content":{"rendered":"<p>There are several different ways to represent floating-point numbers in computers: most architectures now use the IEEE754 standards, representing double precision numbers with 64 bits (a.k.a double, or float64) and single precision numbers with 32 bits (a.k.a float32). As its name suggests, a float64 has more precision than a float32, so it&#8217;s generally advised to use it, unless you are in a performance-sensitive context. Using float32 has the following advantages:<\/p>\n<ul>\n<li>Float32 operations often require less CPU cycles as they need less precision.<\/li>\n<li>There is additional CPU overhead if the operands to a float64 operation are float32, since they must first be converted to float64. (This can be the case in code that uses <code>Float32Array<\/code> to store data, which is often true for JS code using WebGL, as well as <a href=\"http:\/\/www.emscripten.org\">Emscripten<\/a>-compiled code where a <code>float<\/code> in C++ is stored as a float32 in memory but uses JavaScript float64 arithmetic.)<\/li>\n<li>Float32 numbers take half the space, which can decrease the total memory used by the application and increase the speed of memory-bound computations.<\/li>\n<li>Float32 specializations of math functions in several standard C libraries we&#8217;ve tested are way faster than their float64 equivalents.<\/li>\n<\/ul>\n<p>In JavaScript, <a href=\"https:\/\/people.mozilla.org\/~jorendorff\/es6-draft.html#sec-terms-and-definitions-number-value\">number values<\/a> are defined to be float64 and all number arithmetic is defined to use float64 arithmetic. Now, one useful property of float64 arithmetic that JavaScript engines have taken advantage of for a long time is that, when a float64 is a small-enough integer, the result of a float64 operation is the same as the corresponding integer operation when there is no integer overflow. JavaScript engines take advantage of this by representing integer-valued numbers as raw integers and using integer arithmetic (and checking for overflow).  In fact, there are at least 3 different integer representations in use that I know of: 31-bit integers, <code>int32_t<\/code> and <code>uint32_t<\/code> (see <a href=\"http:\/\/wingolog.org\/archives\/2011\/05\/18\/value-representation-in-javascript-implementations\">this post about value representation<\/a> by Andy Wingo for more).<\/p>\n<p>Given all this, a good question is: can we do a similar optimization for float32?  Turns out, the answer is &#8220;sometimes&#8221; and, using the new <a href=\"https:\/\/people.mozilla.org\/~jorendorff\/es6-draft.html#sec-math.fround\"><code>Math.fround<\/code><\/a> function in the upcoming ES6 spec, the programmer has a good way to control when.<\/p>\n<h3>When can we safely use float32 operations instead of float64 operations<\/h3>\n<p>\n  There is an interesting commutative identity satisfied by some float64 and float32 operations, as stated by Samuel A. Figueroa in &#8220;<a href=\"http:\/\/dl.acm.org\/citation.cfm?id=221334\">When is double rounding innocuous?<\/a>&#8221; (SIGNUM Newsl. 30, 3, July 1995, 21-26): as long as the original inputs are float32, you can either use float32 operations or float64 operations and obtain the same float32 result. More precisely, for <code>op<\/code> one of <code>{+,-,*,\/}<\/code>, and <code>op_f32<\/code> the float32 overload and <code>op_f64<\/code> the float64 overload, and <code>x<\/code>,<code>y<\/code> float32 values, the following identity holds, expressed as C++ code:<\/p>\n<pre>\r\n  assert(op_f32(x,y) == (float) op_f64( (double)x, (double)y ));\r\n<\/pre>\n<p>The analogous unary identity also holds for <code>sqrt<\/code> and several other Math functions.\n<\/p>\n<p>This property relies crucially on the casts before and after every single operation. For instance, if x = 1024, y = 0.0001, and z = 1024, <code>(x+y)+z<\/code> doesn&#8217;t have the same result when computed as two float32 additions as when computed as two float64 additions.\n<\/p>\n<p>This identity provides the preconditions that allow a compiler to soundly use float32 instructions instead of float64 instructions.  Indeed, gcc will take advantage of this identity and, for expressions of the form on the right of the equation, will generate float32 code corresponding to the expression on the left.<\/p>\n<p>But when does JavaScript ever have a float32 value?  In HTML5 (with WebGL and, thus, Typed Arrays), the answer is: when reading from or writing to a <code>Float32Array<\/code>.  For example, take this piece of code:<\/p>\n<pre>\r\n  var f32 = new Float32Array(1000);\r\n  for(var i = 0; i < 1000; ++i)\r\n    f32[i] = f32[i] + 1;\r\n<\/pre>\n<p>The addition inside the loop exactly matches the identity above: we take a float32 value from <code>f32<\/code>, convert it to a float64, do a float64 addition, and cast the result back to a float32 to store in <code>f32<\/code>.  (We can view the literal <code>1<\/code> as a float32 <code>1<\/code> cast to a float64 <code>1<\/code> since <code>1<\/code> is precisely representable by a float32.)\n<\/p>\n<p>\nBut what if we want to build more complicated expressions?  Well, we <em>could<\/em> insert unnecessary <code>Float32Array<\/code> loads and stores between each subexpression so that every operation's operands were a load and the result was always stored to a <code>Float32Array<\/code>, but these additional loads and stores would make our code slower and the whole point is to be fast. Yes, a <a href=\"http:\/\/c2.com\/cgi\/wiki?SufficientlySmartCompiler\">sufficiently smart compiler<\/a> might be able to eliminate most of these loads\/stores, but performance predictability is important so the less fragile we can make this optimization the better.  Instead, we proposed a tiny builtin that was accepted into the upcoming ES6 language spec: <code>Math.fround<\/code>.\n<\/p>\n<h3>Math.fround<\/h3>\n<p><a href=\"https:\/\/people.mozilla.org\/~jorendorff\/es6-draft.html#sec-terms-and-definitions-number-value\" title='documentation of Math.fround on MDN'><code>Math.fround<\/code><\/a> is a new Math function proposed for the upcoming ES6 standard. This function rounds its input to the closest float32 value, returning this float32 value as a number.  Thus, <code>Math.fround<\/code> is semantically equivalent to the polyfill:<br \/>\n<a id='polyfill'><\/p>\n<pre>\r\n  if (!Math.fround) {\r\n    Math.fround = (function() {\r\n      var temp = new Float32Array(1);\r\n      return function fround(x) {\r\n        temp[0] = +x;\r\n        return temp[0];\r\n      }\r\n    })();\r\n  }\r\n<\/pre>\n<p><\/a><\/p>\n<p>Note that some browsers don't support Typed Arrays; for these, more complex polyfills are <a href=\"https:\/\/github.com\/inexorabletash\/polyfill\/blob\/master\/typedarray.js\">available<\/a>. The good news is that Math.fround is already implemented both in SpiderMonkey (the JavaScript engine behind Firefox) and JavaScriptCore (the JavaScript engine behind Safari). Moreover, v8's team plans to add it as well, as states <a href=\"http:\/\/code.google.com\/p\/v8\/issues\/detail?id=2938\">this issue.<\/a><\/p>\n<p>As a result, the way to chain float32 operations is simply to wrap any temporary result in a call to <code>Math.fround<\/code>:<\/p>\n<pre>\r\n  var f32 = new Float32Array(1000);\r\n  for(var i = 0; i < 999; ++i)\r\n    f32[i] = Math.fround(f32[i] + f32[i+1]) + 1;\r\n<\/pre>\n<\/p>\n<p>In addition to allowing the programmer to write faster code, this also allows JS compilers, like <a href=\"http:\/\/emscripten.org\/\">Emscripten<\/a> to better compile float32 in the source language.  For example, Emscripten currently compiles C++ <code>float<\/code> operations to JavaScript number operations.  Technically, Emscripten could use <code>Float32Array<\/code> loads\/stores after every operation to throw away the extra float64 precision, but this would be a big slowdown, so fidelity is sacrificed for performance.  Although it's quite rare for this difference to break anything (if it does, the program is likely depending on unspecified behavior), we have seen it cause real bugs in the field and these are not fun to track down. With <code>Math.fround<\/code>, Emscripten would be able to be both more efficient <em>and<\/em> higher fidelity!<\/p>\n<h3>Float32 in IonMonkey<\/h3>\n<p>My <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=900120\">internship project<\/a> was to bring these optimizations to Firefox. The first step was to add general support for float32 operations in the IonMonkey JIT backend. Next, I added <code>Math.fround<\/code> as a general (unoptimized) builtin to the JavaScript engine. Finally, I added an optimization pass that recognizes <code>Float32Array<\/code>\/<code>Math.fround<\/code> and uses their commutative properties to emit float32 operations when possible.  These optimizations are enabled in Firefox 27 (which is currently in the <a href=\"http:\/\/www.mozilla.org\/en-US\/firefox\/aurora\/\">Aurora<\/a> release channel)<\/a><\/p>\n<p>So, how does it perform?  Microbenchmarks (in both C++ and JavaScript) show large speedups, up to 50%.  But micro-benchmarks are often misleading, so I wrote the following more-realistic benchmarks to see what sort of speedups on float32-heavy computations we can expect to see in practice:<\/p>\n<ul>\n<li>\n<p><strong>Matrix inversions<\/strong>: this benchmark creates a bunch of matrixes, inverts them and multiplies them back with the original, to be able to compare the precision loss when using float32 or float64. It uses an adapted version of <a href=\"https:\/\/github.com\/toji\/gl-matrix\" title=\"gl-matrix link on github\">gl-matrix<\/a>, which is a framework used for real-world applications using WebGL matrixes. For the float32 version, only calls to <code>Math.fround<\/code> have been added.\n<\/li>\n<li>\n<p><strong>Matrix graphics<\/strong>: this benchmarks also creates a bunch of matrixes and applies some operations that are frequently used in graphics: translation, rotation, scaling, etc. This one uses a lot of basic operations and more complex operations (like calls to <code>Math.cos<\/code> and <code>Math.sin<\/code> for the rotation). Thus, it shows great improvements when the float32 equivalent forms of these functions are faster. Once more, it uses the adapted version of gl-matrix<\/p>\n<li>\n<p><strong>Exponential<\/strong>: this benchmark fills a big <code>Float32Array<\/code> with predictable values and then computes the exponential of each element by using the first elements of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exponential_function#Formal_definition\" title=\"wikipedia link to exponential function\">exponential's power series<\/a>. The main purpose of this benchmark is just to pound on addition and multiplication.<\/p>\n<li><strong>Fast Fourier Transform<\/strong>: this benchmark creates a fake sample buffer and then applies several steps of Fast Fourier Transform. It also consists of basic operations and some calls to <code>Math.sqrt<\/code>. The FFT code is taken from an existing library, <a href=\"https:\/\/github.com\/corbanbrook\/dsp.js\" title=\"link to github repo of dsp.js\">dsp.js<\/a>.<\/p>\n<\/ul>\n<p>The following table shows results on several different devices, when run on the latest Firefox Nightly. The number indicates the obtained speedup from using code that has been optimized to use <code>Math.fround<\/code> to allow the float32 optimization described above (thus the higher, the better). The desktop machine used is a ThinkPad Lenovo W530 (Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz, 8 cores, 16 GB RAM). When a line indicates a phone or tablet device, the device runs the latest Firefox Nightly for Android version. Once you've read these results, you can try to <a href=\"http:\/\/benjbouv.github.io\/floats32-bench\">run these benchmarks<\/a> by yourself! (Don't forget to use Firefox 27 or greater!) You can see the benchmark source <a href=\"https:\/\/github.com\/BenjBouv\/floats32-bench\">on github<\/a> (on the gh-pages branch).<\/p>\n<table>\n<tr>\n<th>Device<\/th>\n<th><a href=\"http:\/\/benjbouv.github.io\/floats32-bench\/gl-matrix\/\">Matrix Inversions<\/a><\/th>\n<th><a href=\"http:\/\/benjbouv.github.io\/floats32-bench\/gl-matrix\/graphics.html\">Matrix Graphics<\/a><\/th>\n<th><a href=\"http:\/\/benjbouv.github.io\/floats32-bench\/exp\/\">Exponential<\/a><\/th>\n<th><a href=\"http:\/\/benjbouv.github.io\/floats32-bench\/fft\/\">FFT<\/a><\/th>\n<\/tr>\n<tr>\n<td>Desktop (x86)<\/td>\n<td>33%<\/td>\n<td>60%<\/td>\n<td>30%<\/td>\n<td>16%<\/td>\n<\/tr>\n<tr>\n<td>Google Nexus 10 (ARM)<\/td>\n<td>12%<\/td>\n<td>38%<\/td>\n<td>33%<\/td>\n<td>25%<\/td>\n<\/tr>\n<tr>\n<td>Google Nexus 4 (ARM)<\/td>\n<td>42%<\/td>\n<td>26%<\/td>\n<td>38%<\/td>\n<td>5%<\/td>\n<\/tr>\n<tr>\n<td>Samsung Galaxy S3 (ARM)<\/td>\n<td>38%<\/td>\n<td>38%<\/td>\n<td>24%<\/td>\n<td>33%<\/td>\n<\/tr>\n<\/table>\n<h3>Polyfilling Math.fround<\/h3>\n<p>What can we do before <code>Math.fround<\/code> is available and optimized in all JS engines?  Instead of using a faithful polyfill like the one shown <a href=\"#polyfill\">above<\/a>, we can simply use the identity function:<\/p>\n<pre>\r\n  var fround = Math.fround || function(x) { return x }\r\n<\/pre>\n<p>This is what the above benchmarks use, and, as stated above, most code won't notice the difference.<\/p>\n<p>What's nice is that all modern JS engines will usually inline small functions in their high-end JIT so this polyfill shouldn't penalize performance.  We can see this to be the case when running the four benchmarks shown above in, e.g., Chrome Dev.  However, we have seen some cases in larger codes where inlining is not performed (perhaps the max inlining depth was hit or the function wasn't compiling in the high-end JIT for some reason) and performance suffers with the polyfill.  So, in the short term, it's definitely worth a try, but be sure to test.<\/p>\n<h3>Conclusion<\/h3>\n<p>The results are very encouraging.  Since <code>Math.fround<\/code> is in the next ES6 standard, we are hopeful that other JS engines will choose to make the same optimizations.  With the <a href=\"https:\/\/brendaneich.com\/2013\/03\/the-web-is-the-game-platform\/\">Web as the game platform<\/a>, low-level optimizations like these are increasingly important and will allow the Web to get ever closer to native performance.  Feel free to test these optimizations out in Firefox <a href=\"http:\/\/nightly.mozilla.org\/\">Nightly<\/a> or <a href=\"http:\/\/www.mozilla.org\/en-US\/firefox\/aurora\/\">Aurora<\/a> and let us know about any bugs you find.<\/p>\n<p>I would like to thank all those who participated in making this happen: Jon Coppeard and Douglas Crosher for implementing the ARM parts, Luke Wagner, Alon Zakai and Dave Herman for their proof-reading and feedback, and more generally everyone on the JavaScript team for their help and support.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are several different ways to represent floating-point numbers in computers: most architectures now use the IEEE754 standards, representing double precision numbers with 64 bits (a.k.a double, or float64) and single precision numbers with 32 bits (a.k.a float32). As its &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/javascript\/2013\/11\/07\/efficient-float32-arithmetic-in-javascript\/\">Continue reading<\/a><\/p>\n","protected":false},"author":660,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25569,31],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/posts\/612"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/users\/660"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/comments?post=612"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/posts\/612\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/media?parent=612"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/categories?post=612"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/tags?post=612"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}