Optimizing JavaScript variable access

I recently finished a project to improve how SpiderMonkey implements variable access so I thought this would be a good time to explain how it all works now. Taking a note from mraleph’s post (and SICP), I’ll illustrate the implementation using JavaScript as the implementation language. That is, I’ll translate JavaScript using full-featured variable access into JavaScript that doesn’t, rather like how the original C++ compiler translated C++ into C.

Before starting, let me set up the problem space. By variable I’m referring not just to the variables introduced by var, but also those introduced by let, const, catch, function statements, and function argument lists. By variable access, I mean a read or a write. Variable access can take many forms:

Local access (i.e., access to a variable in the same function):
```
function add(x,y) { return x+y }
```
Non-local access (i.e., access to a variable in an enclosing function):
```
function add(x,y) { return (function() { return x+y })() }
```

Access from dynamically-generated code:

function add(x,y) { return eval("x+y") }

Access after dynamic scope modification via non-strict direct eval:
```
function add(a,b) { eval("var x="+a+", y="+b); return x+y }
```
Dynamic function argument access via the arguments object:
```
function add(x,y) { return arguments[0]+arguments[1] }
```
Unexpected debugger snooping (via Firebug, the new builtin Firefox debugger, or directly from privileged JS using the new Debugger API):
```
dbg.onDebugerStatement = function(f) { return f.eval("x+y") }
```

To keep the post small(-ish), I’ll pretend there is only (non-strict, direct) eval and ignore strict and indirect eval as well as with (which we generally deoptimize as if it was an eval). I’ll also ignore let, global access optimizations, the bizarre things SpiderMonkey does for block-level function statements, and the debugger.

The worst case

To rise above, we must first see how low we need to go in the worst case. Consider the following function:

function strange() {
  eval("var x = 42");
  return function xPlus1() { var z = x + 1; return z }
}

Here, eval is dynamically adding x to the scope of strange where it will be read by xPlus1. Since eval can be called with a dynamically-constructed string we must, in general, treat function scopes as dynamic maps from names to values. (Fun fact: names added by eval can be removed using the delete keyword, so the map can both grow and shrink at runtime!)

To make this more concrete, we’ll implement scopes in JS using ES6 Map objects. We’ll give every function its own Map that will be stored in a local variable named scope and hold all the function’s variables. (Yes, we’re using a variable to implement variables; but since we’ll only use a small finite number of them, we can think of them as registers.)

function strange() {
  // the scope of 'strange' is initially empty
  var scope = new Map;

  // eval("var x = 42") effectively executes:
  scope.set('x', 42);

  return function xPlus1() {
    // vars are hoisted so scope initially contains 'z'
    var scope = new Map([['z', undefined]]);

    // var z = x + 1
    scope.set('z', scope.get('x') + 1);  // oops!

    // return z
    return scope.get('z');
  }
}

As the comment indicates, there is a bug in xPlus1: x isn’t in the scope of xPlus1, it’s in the scope of strange! To fix this we need to do two things:

Add an enclosing field to all scope objects indicating the enclosing function’s scope (or the global object if the function is top-level).
Replace uses of scope.get with a lookup algorithm that walks the chain of scopes.

function strange() {
  // the scope of 'strange' is initially empty
  var scope = new Map;
  scope.enclosing = window;

  // eval("var x = 42") effectively executes
  scope.set('x', 42);

  var tmp = function xPlus1() {
    // vars are hoisted so scope initially contains 'z'
    var scope = new Map([['z', undefined]]);
    scope.enclosing = xPlus1.enclosing;

    // var z = x + 1
    scope.set('z', lookup(scope, 'x') + 1);

    // return z
    return lookup(scope, 'z');
  }
  tmp.enclosing = scope;
  return tmp;
}

function lookup(scope, name) {
  while (scope instanceof Map && !scope.has(name))
    scope = scope.enclosing;
  return scope.get(name);
}

Note that, without being able to use non-local variable access (since that is what we are implementing), we must attach the scope of strange to the xPlus1 function object. This isn’t just some hack; it is a fundamental part of the implementation of languages with lexically-scoped first-class functions. More generally, we can establish the following relationship (pardon my ASCII-art):

Function-scope
  | *        ^ 0 or 1
  |          |
  | call of  | enclosing
  |          |
  V 1        | 1
Function-object
  | *
  |
  | evaluation of
  |
  V 1
Function-literal

Each function literal can be evaluated any number of times, with each evaluation producing a function object that is associated with its enclosing scope. Each of those function objects can be called any number of times, each of those calls producing a scope. When using the language, it is easy to see just a single concept function, but hopefully this illustrates that there are really three “function” concepts at play here: scope, object, and literal.

With these changes, we have successfully dealt with the ravages of eval, but at what cost? Each variable access involves a call to an algorithm that iteratively performs hash-table lookups! Fortunately, this problem isn’t that different from object-property lookup and the same type of optimizations apply: hidden classes and caches. I won’t go into these techniques, as there are already two great explanations available. (Caching has been used to speed up name access since Firefox 3.) Even with these optimizations, however, name lookup isn’t as fast as we’d like it to be and we are still creating a Map object on every call.

In summary, we’ve handled the worst case, but we’d like to do better in code that doesn’t exercise the worst case.

Fast local name access

Now let’s optimize local variable access when all accesses are local. With this constraint, JavaScript starts to look like C and we can use some of the same techniques as a C compiler: store all variables in a stack and access variables by their offset in the stack.

As a first (highly garbalicious) iteration, we create an array for each set of arguments and vars, thereby turning

foo(13, 42);

function foo(x,y) {
  var a = x + y;
  return bar(a);
}

into:

foo([13, 42]);

function foo(args) {
  var vars = [undefined];
  vars[0] = args[0] + args[1];
  return bar([vars[0]]);
}

The second step is to avoid creating all those temporary arrays by using one big array, shared by all active functions. There are many ways to do this (corresponding to different calling conventions); we’ll just do something simple here:

// executed some time before the first function call:
var stack = [];

stack.push(13);
stack.push(42);
foo(/* number of arguments pushed = */ 2);

function foo(numArgs) {
  // push missing arguments, pop extra arguments
  for (var i = numArgs; i < 2; i--)
    stack.pop();

  // analogous to the frame pointer register
  var firstLocal = stack.length;

  // push local 'a'
  stack.push(undefined);

  // var a = x + y:
  stack[firstLocal] = stack[firstLocal - 2] + stack[firstLocal - 1];

  // prepare stack for call to 'bar(a)':
  stack.push(stack[firstLocal]);
  return bar(/* number of arguments pushed = */ 1);

  // in this calling convention, the callee pops the arguments
  stack.pop(); // pop 'a'
  stack.pop(); // pop 'y'
  stack.pop(); // pop 'x'
}

With this strategy, a JIT compiler can do some pretty great optimization. To start with, each read from or write to stack in the above JS can be compiled down to a single CPU load or store. This is achieved by caching the address of stack[firstLocal] in a register and rolling the remaining “+ INDEX” into the load instruction as an offset. Even better, modern JavaScript JIT compilers do register allocation which can avoid the loads/stores altogether. (Register allocation has been in Firefox since version 3.5.)

In summary, we can do pretty efficient things for local variable access, but only with some stringent restrictions.

Fast non-local access

While we shouldn’t expect great performance when functions call eval or arguments, the requirement made in the previous section that we only access local variables is pretty harsh and conflicts with both the functional and module patterns of JavaScript programming. In this section, we’ll optimize non-local access.

We start with the observation that, in the absence of eval and other weirdos, there is no need for a fully dynamic scope lookup: we can know exactly where on the scope chain to find the variable being accessed. The first step is to view each top-level function as a tree of nested functions, giving each node (function) in the tree an array of the variables defined in its scope. For example, given this function:

function add3(arg1, arg2, arg3) {
  function addInner(innerArg1) {
    function innermost() { return innerArg1 + arg2 + getArg3() };
    return innermost();
  }
  function getArg3() {
    return arg3;
  }
  return addInner(arg1);
}

we can distill the following tree:

function add3: [arg1, arg2, arg3, addInner, getArg3]
 |\_ function addInner: [innerArg1, innermost]
 |    \_ function innermost: []
  \_ function getArg3: []

The next step is to include uses as leaves of the tree that are linked to the innermost enclosing definition with the same name. Rather than drawing terrible ASCII-art arrows, let’s represent a use-to-definition arrow with a two-number coordinate:

hops = the number of nodes in the tree to skip to get to the function node whose array contains the definition.
index = the index of the definition in the function node’s array.

Linking uses to definitions in the above tree produces:

function add3: [arg1, arg2, arg3, addInner, getArg3]
 |\_ function addInner: [innerArg1, innermost]
 |    |\_ function innermost: []
 |    |    |\_ "innerArg1"   {hops=1, index=0}
 |    |    |\_ "arg2"        {hops=2, index=1}
 |    |     \_ "getArg3"     {hops=2, index=4}
 |     \_ "innermost":       {hops=0, index=1}
 |\_ function getArg3: []
 |     \_ "arg3"             {hops=1, index=2}
 |\_ "addInner"              {hops=0, index=3}
 |\_ "getArg3"               {hops=0, index=4}
  \_ "arg1"                  {hops=0, index=0}

As a last step, we’ll erase all variables that only have local uses. We can also remove entire scopes if they are empty; we just need to be mindful not to include these removed scopes in any hops count. Applying this last transformation produces the following, final tree:

function add3: [arg2, arg3, getArg3]
 |\_ function addInner: [innerArg1]
 |     \_ function innermost: 
 |         |\_ "innerArg1"   {hops=0, index=0}
 |         |\_ "arg2"        {hops=1, index=0}
 |          \_ "getArg3"     {hops=1, index=2}
 |\_ function getArg3: 
 |     \_ "arg3"             {hops=0, index=1}
  \_ "getArg3"               {hops=0, index=2}

With this analysis, we have all the information we need to efficiently compile the program. For the local-only variables that we removed in the last step, we can use the stack directly (as in the second section). For variables with non-local access, we can represent the scope chain as a linked list of scopes (as in the first section), except this time we represent scopes as arrays instead of maps. To compile an access, we use {hops,index} coordinate: hops tells us how many .enclosing links to follow, index tells us the index in the array.

Applying this scheme to the original example (and eliding the missing/extra arguments boilerplate) produces the following translated JS (with the scope access code highlighted in red):

function add3() {
  var firstLocal = arguments.length;

  // the optimized scope of add3 is: [arg2, arg3, getArg3]
  var scope = [stack[firstLocal-2], stack[firstLocal-1], undefined];
  scope.enclosing = window;

  // initialize 'addInner':
  stack.push(function addInner() {
    var firstLocal = arguments.length;

    // the optimized scope of addInner is: [innerArg1]
    var scope = [stack[firstLocal - 1]];
    scope.enclosing = addInner.enclosing;

    // push local 'innermost'
    stack.push(function innermost() {
      // the scope of innermost is completely optimized away
      var scope = innermost.enclosing;

      // return innerArg1 {hops=0, index=0} +
      //        arg2      {hops=1, index=0} +
      //        getArg3() {hops=1, index=2}
      return scope[0] +
             scope.enclosing[0] +
             (scope.enclosing[2])();
    });
    stack[firstLocal].enclosing = scope;

    // return innermost()
    var returnValue = (stack[firstLocal])();
    stack.pop();  // pop 'innermost'
    stack.pop();  // pop 'innerArg1'
    return returnValue;
  });
  stack[firstLocal].enclosing = scope;

  // initialize 'getArg3' {hops=0, index=2}:
  scope[2] = function getArg3() {
    // the scope of getArg3 is completely optimized away
    var scope = getArg3.enclosing;

    // return arg3 {hops=0, index=1}
    return scope[1];
  }
  scope[2].enclosing = scope;

  // return addInner(arg1)
  stack.push(stack[firstLocal - 3]);
  var returnValue = (stack[firstLocal])();
  stack.pop();  // pop 'addInner'
  stack.pop();  // pop 'arg3'
  stack.pop();  // pop 'arg2'
  stack.pop();  // pop 'arg1'
  return returnValue;
}

This strategy is good for JIT compilation in several ways:

If a variable is only accessed locally, it can still live on the stack and receive full JIT optimization.
Each .enclosing expression compiles to a single load instruction. Furthermore, when there are multiple accesses to variables in the same scope, the compiler can factor out the common scope walking.
Since a non-local name access in this scheme is much simpler than the name cache mentioned earlier, IonMonkey is more able to apply the optimizations it uses for local names such as LICM, GVN, and DCE.

In summary, we’ve now optimized non-local access while keeping local access fast. There are several other optimizations related to scopes that soften the blow when eval or arguments is used, but I think this is a good stopping point.

Next steps

The recent scope project basically catches us up to the level of other JS VMs. I should also note that functional languages have been doing similar optimizations forever. Looking forward, there are some straightforward optimizations I think we could do to avoid creating scope objects as well as more advanced optimizations we can lift from the functional crowd.

In SpiderMonkey

If you are interested in seeing the code for all this in SpiderMonkey, you can use the following links to get started:

The {hops,index} coordinate is called ScopeCoordinate.
The various scope objects are described in this ASCII-art tree. (Note, for mostly historical reasons, we use the same underlying representation for objects and scopes. Due to the Shape mechanism (which is pre-generated for scopes at compile-time), scopes are still, effectively, arrays.)
Optimized non-local access is performed with ALIASEDVAR opcodes. See the implementation of these ops in the interpreter and IonMonkey jit.
The frontend name analysis is a bit old and messy (and will hopefully be rewritten sometime in the near future). However, the important part of the analysis is at the very end, when we emit the ALIASEDVAR ops in EmitAliasedVarOp.

Nicolas Pierron has provided a French translation of this post. Thanks!

Optimizing JavaScript variable access

The worst case

Fast local name access

Fast non-local access

Next steps

In SpiderMonkey

3 Responses to Optimizing JavaScript variable access

Recent Posts

Categories

Blogroll