There are several different ways to represent floating-point numbers in computers: most architectures now use the IEEE754 standards, representing double precision numbers with 64 bits (a.k.a double, or float64) and single precision numbers with 32 bits (a.k.a float32). As its name suggests, a float64 has more precision than a float32, so it’s generally advised to use it, unless you are in a performance-sensitive context. Using float32 has the following advantages:
- Float32 operations often require less CPU cycles as they need less precision.
- There is additional CPU overhead if the operands to a float64 operation are float32, since they must first be converted to float64. (This can be the case in code that uses
Float32Array
to store data, which is often true for JS code using WebGL, as well as Emscripten-compiled code where afloat
in C++ is stored as a float32 in memory but uses JavaScript float64 arithmetic.) - Float32 numbers take half the space, which can decrease the total memory used by the application and increase the speed of memory-bound computations.
- Float32 specializations of math functions in several standard C libraries we’ve tested are way faster than their float64 equivalents.
In JavaScript, number values are defined to be float64 and all number arithmetic is defined to use float64 arithmetic. Now, one useful property of float64 arithmetic that JavaScript engines have taken advantage of for a long time is that, when a float64 is a small-enough integer, the result of a float64 operation is the same as the corresponding integer operation when there is no integer overflow. JavaScript engines take advantage of this by representing integer-valued numbers as raw integers and using integer arithmetic (and checking for overflow). In fact, there are at least 3 different integer representations in use that I know of: 31-bit integers, int32_t
and uint32_t
(see this post about value representation by Andy Wingo for more).
Given all this, a good question is: can we do a similar optimization for float32? Turns out, the answer is “sometimes” and, using the new Math.fround
function in the upcoming ES6 spec, the programmer has a good way to control when.
When can we safely use float32 operations instead of float64 operations
There is an interesting commutative identity satisfied by some float64 and float32 operations, as stated by Samuel A. Figueroa in “When is double rounding innocuous?” (SIGNUM Newsl. 30, 3, July 1995, 21-26): as long as the original inputs are float32, you can either use float32 operations or float64 operations and obtain the same float32 result. More precisely, for op
one of {+,-,*,/}
, and op_f32
the float32 overload and op_f64
the float64 overload, and x
,y
float32 values, the following identity holds, expressed as C++ code:
assert(op_f32(x,y) == (float) op_f64( (double)x, (double)y ));
The analogous unary identity also holds for sqrt
and several other Math functions.
This property relies crucially on the casts before and after every single operation. For instance, if x = 1024, y = 0.0001, and z = 1024, (x+y)+z
doesn’t have the same result when computed as two float32 additions as when computed as two float64 additions.
This identity provides the preconditions that allow a compiler to soundly use float32 instructions instead of float64 instructions. Indeed, gcc will take advantage of this identity and, for expressions of the form on the right of the equation, will generate float32 code corresponding to the expression on the left.
But when does JavaScript ever have a float32 value? In HTML5 (with WebGL and, thus, Typed Arrays), the answer is: when reading from or writing to a Float32Array
. For example, take this piece of code:
var f32 = new Float32Array(1000); for(var i = 0; i < 1000; ++i) f32[i] = f32[i] + 1;
The addition inside the loop exactly matches the identity above: we take a float32 value from f32
, convert it to a float64, do a float64 addition, and cast the result back to a float32 to store in f32
. (We can view the literal 1
as a float32 1
cast to a float64 1
since 1
is precisely representable by a float32.)
But what if we want to build more complicated expressions? Well, we could insert unnecessary Float32Array
loads and stores between each subexpression so that every operation's operands were a load and the result was always stored to a Float32Array
, but these additional loads and stores would make our code slower and the whole point is to be fast. Yes, a sufficiently smart compiler might be able to eliminate most of these loads/stores, but performance predictability is important so the less fragile we can make this optimization the better. Instead, we proposed a tiny builtin that was accepted into the upcoming ES6 language spec: Math.fround
.
Math.fround
Math.fround
is a new Math function proposed for the upcoming ES6 standard. This function rounds its input to the closest float32 value, returning this float32 value as a number. Thus, Math.fround
is semantically equivalent to the polyfill:
if (!Math.fround) { Math.fround = (function() { var temp = new Float32Array(1); return function fround(x) { temp[0] = +x; return temp[0]; } })(); }
Note that some browsers don't support Typed Arrays; for these, more complex polyfills are available. The good news is that Math.fround is already implemented both in SpiderMonkey (the JavaScript engine behind Firefox) and JavaScriptCore (the JavaScript engine behind Safari). Moreover, v8's team plans to add it as well, as states this issue.
As a result, the way to chain float32 operations is simply to wrap any temporary result in a call to Math.fround
:
var f32 = new Float32Array(1000); for(var i = 0; i < 999; ++i) f32[i] = Math.fround(f32[i] + f32[i+1]) + 1;
In addition to allowing the programmer to write faster code, this also allows JS compilers, like Emscripten to better compile float32 in the source language. For example, Emscripten currently compiles C++ float
operations to JavaScript number operations. Technically, Emscripten could use Float32Array
loads/stores after every operation to throw away the extra float64 precision, but this would be a big slowdown, so fidelity is sacrificed for performance. Although it's quite rare for this difference to break anything (if it does, the program is likely depending on unspecified behavior), we have seen it cause real bugs in the field and these are not fun to track down. With Math.fround
, Emscripten would be able to be both more efficient and higher fidelity!
Float32 in IonMonkey
My internship project was to bring these optimizations to Firefox. The first step was to add general support for float32 operations in the IonMonkey JIT backend. Next, I added Math.fround
as a general (unoptimized) builtin to the JavaScript engine. Finally, I added an optimization pass that recognizes Float32Array
/Math.fround
and uses their commutative properties to emit float32 operations when possible. These optimizations are enabled in Firefox 27 (which is currently in the Aurora release channel)
So, how does it perform? Microbenchmarks (in both C++ and JavaScript) show large speedups, up to 50%. But micro-benchmarks are often misleading, so I wrote the following more-realistic benchmarks to see what sort of speedups on float32-heavy computations we can expect to see in practice:
-
Matrix inversions: this benchmark creates a bunch of matrixes, inverts them and multiplies them back with the original, to be able to compare the precision loss when using float32 or float64. It uses an adapted version of gl-matrix, which is a framework used for real-world applications using WebGL matrixes. For the float32 version, only calls to
Math.fround
have been added. -
Matrix graphics: this benchmarks also creates a bunch of matrixes and applies some operations that are frequently used in graphics: translation, rotation, scaling, etc. This one uses a lot of basic operations and more complex operations (like calls to
Math.cos
andMath.sin
for the rotation). Thus, it shows great improvements when the float32 equivalent forms of these functions are faster. Once more, it uses the adapted version of gl-matrix -
Exponential: this benchmark fills a big
Float32Array
with predictable values and then computes the exponential of each element by using the first elements of the exponential's power series. The main purpose of this benchmark is just to pound on addition and multiplication. - Fast Fourier Transform: this benchmark creates a fake sample buffer and then applies several steps of Fast Fourier Transform. It also consists of basic operations and some calls to
Math.sqrt
. The FFT code is taken from an existing library, dsp.js.
The following table shows results on several different devices, when run on the latest Firefox Nightly. The number indicates the obtained speedup from using code that has been optimized to use Math.fround
to allow the float32 optimization described above (thus the higher, the better). The desktop machine used is a ThinkPad Lenovo W530 (Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz, 8 cores, 16 GB RAM). When a line indicates a phone or tablet device, the device runs the latest Firefox Nightly for Android version. Once you've read these results, you can try to run these benchmarks by yourself! (Don't forget to use Firefox 27 or greater!) You can see the benchmark source on github (on the gh-pages branch).
Device | Matrix Inversions | Matrix Graphics | Exponential | FFT |
---|---|---|---|---|
Desktop (x86) | 33% | 60% | 30% | 16% |
Google Nexus 10 (ARM) | 12% | 38% | 33% | 25% |
Google Nexus 4 (ARM) | 42% | 26% | 38% | 5% |
Samsung Galaxy S3 (ARM) | 38% | 38% | 24% | 33% |
Polyfilling Math.fround
What can we do before Math.fround
is available and optimized in all JS engines? Instead of using a faithful polyfill like the one shown above, we can simply use the identity function:
var fround = Math.fround || function(x) { return x }
This is what the above benchmarks use, and, as stated above, most code won't notice the difference.
What's nice is that all modern JS engines will usually inline small functions in their high-end JIT so this polyfill shouldn't penalize performance. We can see this to be the case when running the four benchmarks shown above in, e.g., Chrome Dev. However, we have seen some cases in larger codes where inlining is not performed (perhaps the max inlining depth was hit or the function wasn't compiling in the high-end JIT for some reason) and performance suffers with the polyfill. So, in the short term, it's definitely worth a try, but be sure to test.
Conclusion
The results are very encouraging. Since Math.fround
is in the next ES6 standard, we are hopeful that other JS engines will choose to make the same optimizations. With the Web as the game platform, low-level optimizations like these are increasingly important and will allow the Web to get ever closer to native performance. Feel free to test these optimizations out in Firefox Nightly or Aurora and let us know about any bugs you find.
I would like to thank all those who participated in making this happen: Jon Coppeard and Douglas Crosher for implementing the ARM parts, Luke Wagner, Alon Zakai and Dave Herman for their proof-reading and feedback, and more generally everyone on the JavaScript team for their help and support.