The Ins and Outs of Invalidation

Kannan Vijayan

9

One of the primary goals of JIT engines for dynamic languages is to make high-level code run fast by compiling it into efficient machine code.  A key property of this, however, is that the compiled code for any piece of JavaScript must behave exactly the same at runtime as the interpreted code for that JavaScript would have behaved.  In a nutshell, providing efficiency while maintaining correctness is the fundamental problem faced by JIT compilers.

This efficiency is tricky to achieve because dynamic languages like JavaScript are highly polymorphic: any variable can hold values of any type, and in the general case, the runtime cannot know ahead of time what type a particular value will have. For example, even if the program will always add two integers together at a particular place in the code, the engine has to allow for the possibility that a non-integer value might unexpectedly show up.

There are two broad techniques which can be used to deal with this challenge.  The first is guarding, where the engine generates machine code that checks its assumptions before executing the code that relies on those assumptions.  For the adding example above, the engine will generate machine code to do integer addition, but it will make sure to prefix that code with checks which ensure that the two values being added are actually integers.  If the checks fail, the machine code jumps to a slowpath that handles the uncommon case in a general way.  If the check succeeds, the fastpath (or optimized) machine code is executed.  The expectation is that the checks will succeed most of the time, and so the fastpath will be executed most of the time.

While guarding is a useful technique, it imposes a performance penalty.  Checking assumptions takes time, and having those checks in hot code can be a significant performance drain.  The second technique for emitting efficient code while retaining correctness is invalidation.  That’s what I’m going to talk about in this post.  I’m going to use SpiderMonkey’s newly added JIT engine, IonMonkey, to illustrate my examples, but aside from the details the overall strategy will apply to most JIT engines.

Invalidation at 10,000 Feet

The key idea behind invalidation is this: instead of making our jitcode check an assumption every time we are about to do something that relies on that assumption, we ask the engine to mark the jitcode as invalid when the assumption becomes false.  As long as we take care never to run jitcode that’s been marked invalid, then we can go ahead and leave out the guards we would have otherwise added.

Let’s take a look at a simple example to get started.  Here’s a small JavaScript program that computes the sum of some point distances:

function Point(x, y) {
    this.x = x;
    this.y = y;
}
 
function dist(pt1, pt2) {
    var xd = pt1.x - pt2.x,
        yd = pt1.y - pt2.y;
    return Math.sqrt(xd*xd + yd*yd);
}
 
function main() {
    var totalDist = 0;
    var origin = new Point(0, 0); 
    for (var i = 0; i < 20000; i++) {
        totalDist += dist(new Point(i, i), origin);
    }
    // The following "eval" is just there to prevent IonMonkey from
    // compiling main().  Functions containing calls to eval don't
    // currently get compiled by Ion.
    eval("");
    return totalDist;
}
 
main();

When this script is run, the dist function is compiled by IonMonkey after roughly 10,000 iterations, generating the following intermediate representation (IR) code:

[1]   Parameter 0          => Value     // (pt1 value)
[2]   Parameter 1          => Value     // (pt2 value)
[3]   Unbox [1]            => Object    // (unbox pt1 to object) 
[4]   Unbox [2]            => Object    // (unbox pt2 to object)
[5]   LoadFixedSlot(x) [3] => Int32     // (read pt1.x, unbox to Int32)
[6]   LoadFixedSlot(x) [4] => Int32     // (read pt2.x, unbox to Int32)
[7]   Sub [5] [6]          => Int32     // xd = (pt1.x - pt2.x)
[8]   LoadFixedSlot(y) [3] => Int32     // (read pt1.y, unbox to Int32)
[9]   LoadFixedSlot(y) [4] => Int32     // (read pt2.y, unbox to Int32)
[10]  Sub [8] [9]          => Int32     // yd = (pt1.y - pt2.y)
[11]  Mul [7] [7]          => Int32     // (xd*xd)
[12]  Mul [10] [10]        => Int32     // (yd*yd)
[13]  Add [11] [12]        => Int32     // ((xd*xd) + (yd*yd))
[14]  Sqrt [13]            => Double    // Math.sqrt(...)
[15]  Return [14]

The above is a cleaned-up version of the actual IR for illustration. For the actual nitty-gritty low-level IR, Click Here.

In instructions [3] and [4], Ion blindly unboxes the argument values into Object pointers without checking to see if they are actually object pointers and not primitives. In instructions [5], [6], [8], and [9], Ion blindly converts the x and y values loaded from the objects to Int32s.

(Note: Unboxing in this case refers to decoding a generic JavaScript value into a raw value of a specific type such as Int32, Double, Object pointer, or Boolean).

If we were only using guards to check our assumptions, the following extra checks would have gone into this code:

  1. Two type checks: An extra check before each of [3] and [4] to check that the input argument was actually an object pointer before unboxing it.
  2. Four type checks: Ensuring that pt1.x, pt2.x, pt1.y, and pt2.y were all Int32 values before unboxing them.

Instead, the IR code skips these six extra checks, generating extremely tight code. Normally, we wouldn’t be able to do this because the jitcode would simply be wrong if any of these assumptions were not held. If the function dist is ever called with non-object arguments, the jitcode would crash. If dist was ever called with instances of Point with x and y values that were not Int32s, the jitcode would crash.

To get away with generating this efficient code while maintaining correctness, we must make sure that we never run it after its assumptions become invalid. Accomplishing that requires leveraging SpiderMonkey’s type inference system.

Type Inference

SpiderMonkey’s type inference (TI) dynamically tracks known type information for JavaScript objects and scripts, and how those types flow through different parts of the code. The data structures maintained and updated by TI represent the current type knowledge about the program. By tracking when these structures change, the engine can trigger events whenever particular type assumptions are broken.

The following is a very simplified diagram of the type model generated for our program as it runs.

The TypeScript at the top keeps track of the type information for the function dist – such as the type sets associated with the first and second arguments (‘pt1′ and ‘pt2′).

The TypeObject diagrammed below keeps track of the type information associated with instances of Point – such as the type sets associated with the fields ‘x’ and ‘y’.

These structures will be updated by TI as needed. For example, if dist is ever called with an object that is not an instance of Point, TI will update the appropriate argument typeset on the TypeScript. Likewise, if Point is ever instantiated with values for ‘x’ or ‘y’ that are not integers, the parameter type sets on the TypeObject will be updated.

When Ion JIT-compiles a function and makes implicit assumptions about types, it adds invalidation hooks to the appropriate type sets, like so:

Whenever new types are added to type sets, the relevant invalidation hook will be triggered, which will cause the jitcode to be marked invalid.

Experiments With Ion

Let’s see if we can experimentally trigger these kinds of invalidations by changing the code above. I’ll be using the standalone JavaScript shell built from the Mozilla sources to run these examples. You can verify these tests yourself by building a debug version of the JS shell. (See the SpiderMonkey build documentation for more information).

First, let’s start easy – and verify that no invalidations occur with the script above:

$ IONFLAGS=osi js-debug points.js
[Invalidate] Start invalidation.
[Invalidate]  No IonScript invalidation.

(ignore the spurious, extra “Start invalidation” in this and subsequent outputs – it’s due to garbage-collection starting, which makes the engine check for potential invalidations caused by the GC. It’s not relevant to this post)

Passing the environment variable IONFLAGS=osi just asks Ion to dump all invalidation related events to its output as it runs. As the output notes, this program causes no invalidations – since no type assumptions are broken after compilation.

Stab 2

For our second try, let’s break the assumption that all arguments to dist are instances of Point, and see what happens. Here is the modified main() function:

...
function main() {
    var totalDist = 0;
    var origin = new Point(0, 0); 
    for (var i = 0; i < 20000; i++) {
        totalDist += dist(new Point(i, i), origin);
    }
    dist({x:3,y:9}, origin); /** NEW! **/
 
    // The following "eval" is just there to prevent IonMonkey from
    // compiling main().  Functions containing calls to eval don't
    // currently get compiled by Ion.
    eval("");
    return totalDist;
}
...

In this program, the loop is going to run and make dist hot enough that it gets compiled with the assumption that its arguments are always going to be instances of Point. Once the loop is completed, we call dist again, but with its first argument being an object which is not an instance of Point, which breaks the assumption made by the jitcode.

$ IONFLAGS=osi js-debug points.js
[Invalidate] Start invalidation.
[Invalidate]  No IonScript invalidation.
[Invalidate] Start invalidation.
[Invalidate]  Invalidate points.js:6, IonScript 0x1f7e430

(once again, the first “Start invalidation” is just caused by GC kicking in, and it’s not relevant to this post.)

Aha! We’ve managed to trigger invalidation of dist (points.js, line 6). Life is good.
The diagram below shows what’s happening. The dist TypeScript’s first-argument-typeset changes, adding a reference to a new TypeObject. This triggers its invalidation hook, and marks the IonScript invalid:

Stab 3

For our third stab, we break the assumption that fields of Point instances always contain Int32 values. Here is the modified main() function:

...
function main() {
    var totalDist = 0;
    var origin = new Point(0, 0); 
    for (var i = 0; i < 20000; i++) {
        totalDist += dist(new Point(i, i), origin);
    }
    dist(new Point(1.1, 5), origin); /** NEW! **/
 
    // The following "eval" is just there to prevent IonMonkey from
    // compiling main().  Functions containing calls to eval don't
    // currently get compiled by Ion.
    eval("");
    return totalDist;
}
...

And the associated output:

$ IONFLAGS=osi js-debug points.js
[Invalidate] Start invalidation.
[Invalidate]  No IonScript invalidation.
[Invalidate] Start invalidation.
[Invalidate]  Invalidate points.js:6, IonScript 0x1f7e430

And the diagram showing how the script is invalidated:

Invalidation vs. Guarding

Invalidation and guarding are two distinct approaches to generating jitcode for any given operation. Within the jitcode for any given function, both techniques may be used, and neither one is strictly better than the other in all cases.

The advantage of invalidation-based optimization is that guards can be omitted, allowing very hot code to execute extremely quickly. However, there are also downsides. When an assumption fails, every piece of jitcode that relies on that assumption must be marked invalid, and prevented from being executed. This means that the function returns to running via the interpreter, until such a time as we compile it again with a new set of assumptions. If we have the bad luck of choosing invalidation-based optimization on an assumption that happens to break, then we’re suddenly stuck with a very high cost.

With guard-based optimization, we sacrifice the speed of the optimize code, but the resulting code is more resilient to changes in assumptions – we don’t have to throw away jitcode when assumptions break. In those cases, the code will just execute a slowpath. If the number of times a function is called with invalid assumptions is small (but greater than zero), then guard-based optimization might be a better choice than invalidation-based optimization.

The question of which optimization strategy to pick when is not always easily answered. Even if an assumption is invalidated occasionally, the jitcode which assumes it may be hot enough to make it worthwhile to use invalidation-based optimization, and just pay the cost of the occasional recompile – the time saved in between will be worth the recompile cost. Alternatively, it may not be worth the cost. Beyond a certain point, deciding between the two approaches becomes a matter of heuristics and tuning.

On-Stack Invalidation

The post so far gives a general idea of the main principles behind invalidations. That said, there is a very important corner case with invalidation that all JIT engines need to take care of.

The corner case is called on-stack invalidation, and it arises because sometimes, the jitcode that becomes invalid is code that’s on the call stack (i.e. it’s one of the callers of the code that’s currently executing). This means that once the current code finishes running, it’s going to execute a machine-level return to jitcode that we can no longer safely execute.

Handling this case requires a bit of juggling, and consideration for a number of subtle errors that may arise. I’ll cover more on that in a follow-up post.

IonMonkey in Firefox 18

David Anderson

45

Today we enabled IonMonkey, our newest JavaScript JIT, in Firefox 18. IonMonkey is a huge step forward for our JavaScript performance and our compiler architecture. But also, it’s been a highly focused, year-long project on behalf of the IonMonkey team, and we’re super excited to see it land.

SpiderMonkey has a storied history of just-in-time compilers. Throughout all of them, however, we’ve been missing a key component you’d find in typical production compilers, like for Java or C++. The old TraceMonkey*, and newer JägerMonkey, both had a fairly direct translation from JavaScript to machine code. There was no middle step. There was no way for the compilers to take a step back, look at the translation results, and optimize them further.

IonMonkey provides a brand new architecture that allows us to do just that. It essentially has three steps:

  1. Translate JavaScript to an intermediate representation (IR).
  2. Run various algorithms to optimize the IR.
  3. Translate the final IR to machine code.

We’re excited about this not just for performance and maintainability, but also for making future JavaScript compiler research much easier. It’s now possible to write an optimization algorithm, plug it into the pipeline, and see what it does.

Benchmarks

With that said, what exactly does IonMonkey do to our current benchmark scores? IonMonkey is targeted at long-running applications (we fall back to JägerMonkey for very short ones). I ran the Kraken and Google V8 benchmarks on my desktop (a Mac Pro running Windows 7 Professional). On the Kraken benchmark, Firefox 17 runs in 2602ms, whereas Firefox 18 runs in 1921ms, making for roughly a 26% performance improvement. For the graph, I converted these times to runs per minute, so higher is better:

On Google’s V8 benchmark, Firefox 15 gets a score of 8474, and Firefox 17 gets a score of 9511. Firefox 18, however, gets a score of 10188, making it 7% faster than Firefox 17, and 20% faster than Firefox 15.

We still have a long way to go: over the next few months, now with our fancy new architecture in place, we’ll continue to hammer on major benchmarks and real-world applications.

The Team

For us, one of the coolest aspects of IonMonkey is that it was a highly-coordinated team effort. Around June of 2011, we created a somewhat detailed project plan and estimated it would take about a year. We started off with four interns – Andrew Drake, Ryan Pearl, Andy Scheff, and Hannes Verschore – each implementing critical components of the IonMonkey infrastructure, all pieces that still exist in the final codebase.

In late August 2011 we started building out our full-time team, which now includes Jan de Mooij, Nicolas Pierron, Marty Rosenberg, Sean Stangl, Kannan Vijayan, and myself. (I’d also be remiss not mentioning SpiderMonkey alumnus Chris Leary, as well as 2012 summer intern Eric Faust.) For the past year, the team has focused on driving IonMonkey forward, building out the architecture, making sure its design and code quality is the best we can make it, all while improving JavaScript performance.

It’s really rewarding when everyone has the same goals, working together to make the project a success. I’m truly thankful to everyone who has played a part.

Technology

Over the next few weeks, we’ll be blogging about the major IonMonkey components and how they work. In brief, I’d like to highlight the optimization techniques currently present in IonMonkey:

  • Loop-Invariant Code Motion (LICM), or moving instructions outside of loops when possible.
  • Sparse Global Value Numbering (GVN), a powerful form of redundant code elimination.
  • Linear Scan Register Allocation (LSRA), the register allocation scheme used in the HotSpot JVM (and until recently, LLVM).
  • Dead Code Elimination (DCE), removing unused instructions.
  • Range Analysis; eliminating bounds checks (will be enabled after bug 765119)

Of particular note, I’d like to mention that IonMonkey works on all of our Tier-1 platforms right off the bat. The compiler architecture is abstracted to require minimal replication of code generation across different CPUs. That means the vast majority of the compiler is shared between x86, x86-64, and ARM (the CPU used on most phones and tablets). For the most part, only the core assembler interface must be different. Since all CPUs have different instruction sets – ARM being totally different than x86 – we’re particularly proud of this achievement.

Where and When?

IonMonkey is enabled by default for desktop Firefox 18, which is currently Firefox Nightly. It will be enabled soon for mobile Firefox as well. Firefox 18 becomes Aurora on Oct 8th, and Beta on November 20th.

* Note: TraceMonkey did have an intermediate layer. It was unfortunately very limited. Optimizations had to be performed immediately and the data structure couldn’t handle after-the-fact optimizations.

Incremental GC in Firefox 16!

Bill McCloskey

24

Firefox 16 will be the first version to support incremental garbage collection. This is a major feature, over a year in the making, that makes Firefox smoother and less laggy. With incremental GC, Firefox responds more quickly to mouse clicks and key presses. Animations and games will also draw more smoothly.

The basic purpose of the garbage collector is to collect memory that JavaScript programs are no longer using. The space that is reclaimed can then be reused for new JavaScript objects. Garbage collections usually happen every five seconds or so. Prior to incremental GC landing, Firefox was unable to do anything else during a collection: it couldn’t respond to mouse clicks or draw animations or run JavaScript code. Most collections were quick, but some took hundreds of milliseconds. This downtime can cause a jerky, frustrating user experience. (On Macs, it causes the dreaded spinning beachball.)

Incremental garbage collection fixes the problem by dividing the work of a GC into smaller pieces. Rather than do a 500 millisecond garbage collection, an incremental collector might divide the work into fifty slices, each taking 10ms to complete. In between the slices, Firefox is free to respond to mouse clicks and draw animations.

I’ve created a demo to show the difference made by incremental GC. If you’re running a Firefox 16 beta, you can try it out here. (If you don’t have Firefox 16, the demo will still work, although it won’t perform as well.) The demo shows GC performance as an animated chart. To make clear the difference between incremental and non-incremental GC, I’ll show two screenshots from the demo. The first one was taken with incremental GC disabled. Later I’ll show a chart with incremental collections enabled. Here is the non-incremental chart:

Time is on the horizontal axis; the red dot moves to the right and shows the current time. The vertical axis, drawn with a log scale, shows the time it takes to draw each frame of the demo. This number is the inverse of frame rate. Ideally, we would like to draw the animation at 60 frames per second, so the time between frames should be 1000ms / 60 = 16.667ms. However, if the browser needs to do a garbage collection or some other task, then there will be a longer pause between frames.

The two big bumps in the graph are where non-incremental garbage collections occured. The number in red shows that the time of the worst bump–in this case, 260ms. This means that the browser was frozen for a quarter second, which is very noticeable. (Note: garbage collections often don’t take this long. This demo allocates a lot of memory, which makes collections take longer to demonstrate the benefits of incremental GC.)

To generate the chart above, I disabled incremental GC by visiting about:config in the URL bar and setting the javascript.options.mem.gc_incremental preference to false. (Don’t forget to turn it on again if you try this yourself!) If I enable incremental GC, the chart looks like this:

This chart also shows two collections. However, the longest pause here is only 67ms. This pause is small enought that it is unlikely to be discernible. Notice, though, that the collections here are more spread out. In the top image, the 260ms pause is about 30 pixels wide. In the bottom image, the GCs are about 60 pixels wide. That’s because the incremental collections in the bottom chart are split into slices; in between the slices, Firefox is drawing frames and responding to input. So the total duration of the garbage collection is about twice as long. But it is much less likely that anyone will be affected by these shorter collection pauses.

At this point, we’re still working heavily on incremental collection. There are still some phases of collection that have not been incrementalized. Most of the time, these phases don’t take very long. But users with many tabs open may still see unacceptable pauses. Firefox 17 and 18 will have additional improvements that will decrease pause times even more.

If you want to explore further, you can install MemChaser, an addon for Firefox that shows garbage collection pauses as they happen. For each collection, the worst pause is displayed in the addon bar at the bottom of the window. It’s important to realize that not all pauses in Firefox are caused by garbage collection. You can use MemChaser to correlate the bumps in the chart with garbage collections reported by MemChaser.

If there is a bump when no garbage collection happened, then something else must have caused the pause. The Snappy project is a larger effort aimed at reducing pauses in Firefox. They have developed tools to figure out the sources of pauses (often called “jank”) in the browser. Probably the most important tool is the SPS profiler. If you can reliably reproduce a pause, then you can profile it and figure out what Firefox code was running that made us slow. Then file a bug!

Introducing the official Mozilla JavaScript team blog

Naveed Ihsanullah

3

Mozilla’s mission is “to promote openness, innovation and opportunity on the web.” Here on the JavaScript engine team we have unique opportunities to support this mission. Our work stretches from technical challenges like Incremental Garbage Collection to working out the details of new language features in JavaScript.next.

We have many great projects going on that we are excited to share with you. This blog will make it easier for everyone to keep up and stay involved with SpiderMonkey as the team helps build a better web.