Efficient float32 arithmetic in JavaScript

There are several different ways to represent floating-point numbers in computers: most architectures now use the IEEE754 standards, representing double precision numbers with 64 bits (a.k.a double, or float64) and single precision numbers with 32 bits (a.k.a float32). As its name suggests, a float64 has more precision than a float32, so it’s generally advised to use it, unless you are in a performance-sensitive context. Using float32 has the following advantages:

Float32 operations often require less CPU cycles as they need less precision.
There is additional CPU overhead if the operands to a float64 operation are float32, since they must first be converted to float64. (This can be the case in code that uses Float32Array to store data, which is often true for JS code using WebGL, as well as Emscripten-compiled code where a float in C++ is stored as a float32 in memory but uses JavaScript float64 arithmetic.)
Float32 numbers take half the space, which can decrease the total memory used by the application and increase the speed of memory-bound computations.
Float32 specializations of math functions in several standard C libraries we’ve tested are way faster than their float64 equivalents.

In JavaScript, number values are defined to be float64 and all number arithmetic is defined to use float64 arithmetic. Now, one useful property of float64 arithmetic that JavaScript engines have taken advantage of for a long time is that, when a float64 is a small-enough integer, the result of a float64 operation is the same as the corresponding integer operation when there is no integer overflow. JavaScript engines take advantage of this by representing integer-valued numbers as raw integers and using integer arithmetic (and checking for overflow). In fact, there are at least 3 different integer representations in use that I know of: 31-bit integers, int32_t and uint32_t (see this post about value representation by Andy Wingo for more).

Given all this, a good question is: can we do a similar optimization for float32? Turns out, the answer is “sometimes” and, using the new Math.fround function in the upcoming ES6 spec, the programmer has a good way to control when.

When can we safely use float32 operations instead of float64 operations

There is an interesting commutative identity satisfied by some float64 and float32 operations, as stated by Samuel A. Figueroa in “When is double rounding innocuous?” (SIGNUM Newsl. 30, 3, July 1995, 21-26): as long as the original inputs are float32, you can either use float32 operations or float64 operations and obtain the same float32 result. More precisely, for op one of {+,-,*,/}, and op_f32 the float32 overload and op_f64 the float64 overload, and x,y float32 values, the following identity holds, expressed as C++ code:

  assert(op_f32(x,y) == (float) op_f64( (double)x, (double)y ));

The analogous unary identity also holds for sqrt and several other Math functions.

This property relies crucially on the casts before and after every single operation. For instance, if x = 1024, y = 0.0001, and z = 1024, (x+y)+z doesn’t have the same result when computed as two float32 additions as when computed as two float64 additions.

This identity provides the preconditions that allow a compiler to soundly use float32 instructions instead of float64 instructions. Indeed, gcc will take advantage of this identity and, for expressions of the form on the right of the equation, will generate float32 code corresponding to the expression on the left.

But when does JavaScript ever have a float32 value? In HTML5 (with WebGL and, thus, Typed Arrays), the answer is: when reading from or writing to a Float32Array. For example, take this piece of code:

  var f32 = new Float32Array(1000);
  for(var i = 0; i < 1000; ++i)
    f32[i] = f32[i] + 1;

The addition inside the loop exactly matches the identity above: we take a float32 value from f32, convert it to a float64, do a float64 addition, and cast the result back to a float32 to store in f32. (We can view the literal 1 as a float32 1 cast to a float64 1 since 1 is precisely representable by a float32.)

But what if we want to build more complicated expressions? Well, we could insert unnecessary Float32Array loads and stores between each subexpression so that every operation's operands were a load and the result was always stored to a Float32Array, but these additional loads and stores would make our code slower and the whole point is to be fast. Yes, a sufficiently smart compiler might be able to eliminate most of these loads/stores, but performance predictability is important so the less fragile we can make this optimization the better. Instead, we proposed a tiny builtin that was accepted into the upcoming ES6 language spec: Math.fround.

Math.fround

Math.fround is a new Math function proposed for the upcoming ES6 standard. This function rounds its input to the closest float32 value, returning this float32 value as a number. Thus, Math.fround is semantically equivalent to the polyfill:

  if (!Math.fround) {
    Math.fround = (function() {
      var temp = new Float32Array(1);
      return function fround(x) {
        temp[0] = +x;
        return temp[0];
      }
    })();
  }

Note that some browsers don't support Typed Arrays; for these, more complex polyfills are available. The good news is that Math.fround is already implemented both in SpiderMonkey (the JavaScript engine behind Firefox) and JavaScriptCore (the JavaScript engine behind Safari). Moreover, v8's team plans to add it as well, as states this issue.

As a result, the way to chain float32 operations is simply to wrap any temporary result in a call to Math.fround:

  var f32 = new Float32Array(1000);
  for(var i = 0; i < 999; ++i)
    f32[i] = Math.fround(f32[i] + f32[i+1]) + 1;

In addition to allowing the programmer to write faster code, this also allows JS compilers, like Emscripten to better compile float32 in the source language. For example, Emscripten currently compiles C++ float operations to JavaScript number operations. Technically, Emscripten could use Float32Array loads/stores after every operation to throw away the extra float64 precision, but this would be a big slowdown, so fidelity is sacrificed for performance. Although it's quite rare for this difference to break anything (if it does, the program is likely depending on unspecified behavior), we have seen it cause real bugs in the field and these are not fun to track down. With Math.fround, Emscripten would be able to be both more efficient and higher fidelity!

Float32 in IonMonkey

My internship project was to bring these optimizations to Firefox. The first step was to add general support for float32 operations in the IonMonkey JIT backend. Next, I added Math.fround as a general (unoptimized) builtin to the JavaScript engine. Finally, I added an optimization pass that recognizes Float32Array/Math.fround and uses their commutative properties to emit float32 operations when possible. These optimizations are enabled in Firefox 27 (which is currently in the Aurora release channel)

So, how does it perform? Microbenchmarks (in both C++ and JavaScript) show large speedups, up to 50%. But micro-benchmarks are often misleading, so I wrote the following more-realistic benchmarks to see what sort of speedups on float32-heavy computations we can expect to see in practice:

Matrix inversions: this benchmark creates a bunch of matrixes, inverts them and multiplies them back with the original, to be able to compare the precision loss when using float32 or float64. It uses an adapted version of gl-matrix, which is a framework used for real-world applications using WebGL matrixes. For the float32 version, only calls to Math.fround have been added.
Matrix graphics: this benchmarks also creates a bunch of matrixes and applies some operations that are frequently used in graphics: translation, rotation, scaling, etc. This one uses a lot of basic operations and more complex operations (like calls to Math.cos and Math.sin for the rotation). Thus, it shows great improvements when the float32 equivalent forms of these functions are faster. Once more, it uses the adapted version of gl-matrix
Exponential: this benchmark fills a big Float32Array with predictable values and then computes the exponential of each element by using the first elements of the exponential's power series. The main purpose of this benchmark is just to pound on addition and multiplication.
Fast Fourier Transform: this benchmark creates a fake sample buffer and then applies several steps of Fast Fourier Transform. It also consists of basic operations and some calls to Math.sqrt. The FFT code is taken from an existing library, dsp.js.

The following table shows results on several different devices, when run on the latest Firefox Nightly. The number indicates the obtained speedup from using code that has been optimized to use Math.fround to allow the float32 optimization described above (thus the higher, the better). The desktop machine used is a ThinkPad Lenovo W530 (Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz, 8 cores, 16 GB RAM). When a line indicates a phone or tablet device, the device runs the latest Firefox Nightly for Android version. Once you've read these results, you can try to run these benchmarks by yourself! (Don't forget to use Firefox 27 or greater!) You can see the benchmark source on github (on the gh-pages branch).

Device	Matrix Inversions	Matrix Graphics	Exponential	FFT
Desktop (x86)	33%	60%	30%	16%
Google Nexus 10 (ARM)	12%	38%	33%	25%
Google Nexus 4 (ARM)	42%	26%	38%	5%
Samsung Galaxy S3 (ARM)	38%	38%	24%	33%

Polyfilling Math.fround

What can we do before Math.fround is available and optimized in all JS engines? Instead of using a faithful polyfill like the one shown above, we can simply use the identity function:

  var fround = Math.fround || function(x) { return x }

This is what the above benchmarks use, and, as stated above, most code won't notice the difference.

What's nice is that all modern JS engines will usually inline small functions in their high-end JIT so this polyfill shouldn't penalize performance. We can see this to be the case when running the four benchmarks shown above in, e.g., Chrome Dev. However, we have seen some cases in larger codes where inlining is not performed (perhaps the max inlining depth was hit or the function wasn't compiling in the high-end JIT for some reason) and performance suffers with the polyfill. So, in the short term, it's definitely worth a try, but be sure to test.

Conclusion

The results are very encouraging. Since Math.fround is in the next ES6 standard, we are hopeful that other JS engines will choose to make the same optimizations. With the Web as the game platform, low-level optimizations like these are increasingly important and will allow the Web to get ever closer to native performance. Feel free to test these optimizations out in Firefox Nightly or Aurora and let us know about any bugs you find.

I would like to thank all those who participated in making this happen: Jon Coppeard and Douglas Crosher for implementing the ARM parts, Luke Wagner, Alon Zakai and Dave Herman for their proof-reading and feedback, and more generally everyone on the JavaScript team for their help and support.

8 responses

Jukka Jylänki wrote on November 7, 2013 at 8:11 pm :

This is great! Can’t wait to run MathGeoLib benchmarks through the new optimized code path!

Arpad Borsos wrote on November 8, 2013 at 6:28 pm:

Next step then is to support native int64 🙂

Caspy7 wrote on November 9, 2013 at 8:16 am:

When did this land?
I’m curious to have a look at arewefastyet and see what kind of difference it might have made in the different benchmarks.

1. Luke Wagner wrote on November 12, 2013 at 5:45 pm:
  
  awfy does not yet contain benchmarks that contain the Math.fround calls necessary to take advantage of the optimization. We’ll be adding these soon, though.
  
Abhishek Shukla wrote on November 12, 2013 at 8:19 am :

It will be really interesting to see how much Math.fround reduces the memory consumption.

Martin Rinehart wrote on December 5, 2013 at 7:29 pm :

WebGL is built on the lightweight version of OpenGL, which is specifically lightened for small, battery-powered devices, such as phones and tablets. It uses 32-bit floats. So BRAVO!

That said, your GPU (even in an only semi-smart phone) can do matrix multiplication and WebGL already can take JS values and create an array of 32-bit floats. Doing matrix algebra in your CPU is silly. So there’s already an alternative, optimized for 32-bit floats and available in any browser that supports WebGL. (That’s every browser not made in the state of Washington, I believe.)

zibin wrote on December 24, 2013 at 11:06 am :

This is really interesting, I wonder if the polyfill improvements is as good as fround itself.

1. jwalden wrote on January 8, 2014 at 9:47 pm :
  
  It’ll probably mostly depend how much the JS engine implements these sorts of float32 optimizations, I expect, in practice. Converting a double to float, then converting it back to a double for handoff to generic JS, is slower than having the engine itself be smart about it. Also it’ll depend on the code you’ve written. If you’re just feeding your float32 back into things that expect a double anyway (like, say, into a Float64Array element), any gains are going to disappear. If you try to write slower code, you usually can. 🙂