IonMonkey: Evil on your behalf

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.”
This quote is not as well known as the end of it, often shortened to “premature optimization is the root of all evil”. It highlights why you should prefer code which is more maintainable, as long as the performance of it does not impact the user.
 

Maintainable code is a matter of taste. For example, we often teach students to dislike “goto” in C, but the best error handling code I have seen (in C) comes from the Linux kernel, and is using “goto”’s. Maintainable code in JavaScript is also a matter of taste. Some might prefer using ES6 features, while others might prefer functional programing approach with lambdas or even using some framework.

In most cases, we add more abstractions, we add more memory, and more code, thus giving even more reasons for the code to be slower. Today, I will introduce two optimizations that made it into IonMonkey, which are known as Scalar Replacement and Branch Pruning.
 

Scalar Replacement

 
The goal of Scalar Replacement is to reduce the amount of memory needed to represent objects in memory, by replacing object properties with local variables.  Then, when all objects properties are replaced, we can remove the object and avoid its memory overhead (i-e allocation, accesses, and GCs).
 
For example, in the following code, we have a few functions for manipulating complex numbers as a tuple of a real and an imaginary part.  When compiling the norm_multiply function, the Inlining phase will bake the code of complex and norm functions inside the norm_multiply function.

function complex(r, i) {
    return { r: r, i: i };
}
function norm(c) {
    return Math.sqrt(c.r * c.r + c.i * c.i);
}
function norm_multiply(a, b) {
    var mul = complex(a.r * b.r - a.i * b.i, a.r * b.i + a.i * b.r);
    return norm(mul);
}
Thus, Scalar Replacement handles the object coming from the complex function, and replaces it by two local variables.  The object is no longer needed and IonMonkey effectively runs the following code:

function norm_multiply(a, b) {
    var mul_r = a.r * b.r - a.i * b.i;
    var mul_i = a.r * b.i + a.i * b.r;
    return Math.sqrt(mul_r * mul_r + mul_i * mul_i);
}
This optimization works by looking at one object, then determining if the object never escapes (detailed below), that all properties are known, and that it is not mixed with other objects.
 
Once all these predicates are validated, this optimization emulates the content of the memory of the object while traversing the control flow graph of the program, thereby replacing all property accesses with reads of local variables.
 
Once all property accesses are removed, the allocated object is used only by a few recover instructions that are capturing the object state at across the control flow graph. Each object state is an instruction that serves no purpose during the execution, but uses the Recover Instructions mechanism to set the content of the properties on the deoptimization path (bailout) to Baseline compiled code. At the end, the Sink / Dead Code Elimination phases convert the object allocation into a recovered instruction, as it is only used by the object state instructions.
 

Escape Analysis

 
One of the biggest limitation of Scalar Replacement is that it is limited to objects that do not escape.
 
We have multiple ways for an object to escape:
  • a function call.
    
    escape({ bar: 1, baz: 2 });
  •  a returned value.
    
    function next() {
        return { done: false, value: 0};
    }
  • an escaped object property.
    
    escaped_obj = { property: obj };
    escape(escaped_obj);
    
The problem of an escaped object is that we have no idea how the object would be manipulated, thus we cannot safely replace the properties of the object with local variables.
 
The good news is that we already have an optimization phase, called Inlining, that already takes care of inserting the code of smaller functions into the body of their calling functions, as long as they are frequently used.
 
Inlining has some limitations, such as the size and hotness of the function. For example, it might be possible for scalar replacement to never trigger if the object escapes in an unused branch, such as in exception handling or logging code.
 
function doSomething(obj) {
    if (theHighlyUnlikelyHappened())
        throw new Error("aaaaaaahhh!!", obj);
}
Fortunately, this kind of problem is addressed by the Branch Pruning optimization described in the next section.
 

Mixing Objects from Multiple Allocation Sites

 
One other limitation of Scalar Replacement is that we have to be able to identify a single allocation that dominates (i-e in the same block or in any enclosing block) the rest of its uses.  For example, the following code causes problem because at the end of the then-block, we do not know which branch the allocated object comes from.
 
function dummy() {
    var obj = { done: false }; // (a)
    if (len == max)
        obj = { done: true }; // (b)
    if (obj.done) // obj comes from either (a) or (b)
        console.log("We are done! \o/");
}
This issue also appears in the case of a returned object.  When a function is inlined, all the return statements are funneled into a return block.  Thus, Scalar Replacement is not capable of mixing multiple objects allocations.  This problem occurs in the next function of iterators:
 
function next() {
    if (this.idx < this.len)
        return { value: this.getValue(idx), done: false };
    return { done: true };
}
As long as all properties are known, these can be transformed quite easily, by creating a single object allocation ahead of the condition and mutating the object, while returning the same object from all paths.
 
function next() {
    var obj = { value: undefined, done: false };
    if (this.idx < this.len) {
        obj.value = this.getValue(idx);
        return obj;
    }
    obj.done = true;
    return obj;
}
This problem actually occurred in the self-hosted code of the Array iterator‘s next() function, used by for-of.  At first we rewrote it as shown in the example above, but to properly handle the security model of SpiderMonkey, this trick was not enough. The security model requires an extra branch which adds a new return statement with a different object, which cannot be merged above. Fortunately, this issue goes away with Branch Pruning, as we will see below.
 

Known Properties

 
Another limitation of Scalar Replacement is the requirement to identify property accesses at compile time. This implies that one cannot expect to have a working Scalar Replacement for a loop iterating over the properties of an object, or the indexes of an array.
 
function norm1(vec) {
    var sum = 0;
    for (var i = 0; i < vec.length; i++)
        sum += vec[i];
    return sum;
}
This is one case where a Loop Unrolling optimization might be useful in the future.
 

Lambdas

 
Lambdas are one of the cases where Scalar Replacement makes a lot of sense, especially in SpiderMonkey where the scope chain uses objects as the underlying representation.
 
Each time you execute code that gives a lambda literal as an argument, a new function is created.  This new function holds a pointer to its function environment, which itself holds a pointer to the scope chain of the enclosing function.
 
In cases where none of the lambdas within a function escape, we can use scalar replacement to optimize scope chain accesses within the inlined code of the lambdas.
 
For example, in the following code, the Inlining will add the code for the forEach function, and the code of the lambda given as argument to the forEach function, into the caller. Then Scalar Replacement will detect that the lambda does not escape, since forEach and the lambda are inlined, and it will replace the scope chain holding the captured sum variable with a local variable.
 
function norm1(vec) {
    var sum = 0;
    vec.forEach((x) => { sum += x; });
    return sum;
}
At the same time, the scope chain allocation as well as the new function allocation holding it will be moved to the bailout path.  Thus, we will no longer do any allocation to execute this function.  In this case, Scalar Replacement makes this forEach call as fast as the C-like for loop equivalent.
 
Scalar Replacement has a lot of pitfalls, but when all the conditions are met, it can remove tons of allocations while allowing JavaScript developers to use higher levels of abstraction.
 
As soon as we make generic functions, it becomes quite hard to avoid these pitfalls. Just the fact that the code is present, even if it is never executed, causes these pitfalls to appear.
 

Branch Pruning

 
The goal of Branch Pruning is to remove unused branches. This is similar to the badly named Profiler Guided Optimization (PGO) phase that we have in static compilers, except that instead of only moving infrequently used branches out of the main code path, we remove them entirely from IonMonkey’s generated code.
 
To do so, we instrument SpiderMonkey to count the number of time each block of code gets executed. Then, when compiling the code, we check based on hand-crafted heuristics whether a block of code should be removed or not. The heuristics select branches that have never been executed, are too complex (i-e store values in memory, make calls, have a large number of instructions) and have predecessors with a large numbers of executions. If a block should be removed, then we replace the branch with a bailout that will fall back to the Baseline compiled code to resume the execution.
 
This optimization alone does not bring much performance improvement. At best, it can speed up the compiler by removing a chunk of the workload, and help the instruction cache of the CPU by removing instructions from the pipeline.
 
However, Branch Pruning helps the other phases of IonMonkey. By removing unused branches, we improve other optimizations such as Scalar Replacement, Loop Invariant Code Motion, etc.
 
Thus code that has to handle unlikely error cases, such as in the following, can be modified by Branch Pruning to remove the block that has never executed yet.
 
function doSomething(obj) {
    if (theHighlyUnlikelyHappened())
        throw new Error("aaaaaaahhh!!", obj);
}
In this example, IonMonkey will convert the then-block into a path which no longer merges back in the control flow graph, and instead does a bailout to Baseline. In JavaScript terms, this would be similar to doing a yield while we are in IonMonkey’s compiled code and resuming the execution in Baseline’s compiled code. Baseline’s compiled code still has the instruction for running the exception handling code.
 
// IonMonkey's compiled code
function doSomething(obj) {
    if (theHighlyUnlikelyHappened())
        bailout; // yield and resume in Baseline compiled code.
}

Frameworks

 
In today’s usage of JavaScript, a single page uses multiple frameworks. These frameworks are likely made to handle more than one use case. As a user of these frameworks, you are probably only using a few, either by convention or by habit.
 
The promise of Branch Pruning is that all the branches of code that you are not using at all will be removed. Thus unused branches would not prevent optimizations.
 
For example, as mentioned in the Scalar Replacement optimization, the Array iterator’s next() function has an extra branch to support the security model of SpiderMonkey.  This adds an extra branch that is unlikely to be used by most websites, thus Branch Pruning is able to remove this branch and replace it with a bailout.
 
As this branch is replaced by a bailout, this waives the limitation preventing Scalar Replacement.  Thus, the fact that Branch Pruning removes code enables Scalar Replacement to reduce the memory allocations, and also optimize the execution of for-of loops.
 
The following histogram represents the relative speed of the for-of micro-benchmark compared against the same for loop written in C-style.  In addition, we compare with the improvements provided by Scalar Replacement (SR), Branch Pruning (BP), and with both enabled (SR+BP).  This highlights that these two optimizations are better than the sum of their individual contributions.  These optimizations are dividing the time taken by for-of loops by a factor of 2.5x.
 
 

Future Work

 
Scalar Replacement and Branch Pruning, when combined, are able to remove a lot of code and allocations, and are able to improve the speed of frameworks such as ES6 for-of loops (only a factor of 2.8x behind a C-like for loop). To make for-of loops as fast as C-like for loops, we would need to remove the bounds checks on the array element indexes, as well as add support for already allocated objects to Scalar Replacement.
 
Branch Pruning heuristics are too conservative today, meaning that we will not attempt to remove branches unless we are confident that the branch is not going to be used. This problem comes from the fact that bailouts are costly. Hopefully, this will be addressed in the future in a project with the code name ThreeHeadedMonkey.
 
Thanks to the blog post reviewers, with special thanks to Steve Fink, and to all the SpiderMonkey team for helping with the Scalar Replacement and Branch Pruning implementations.

No responses yet

Post a comment

Post Your Comment