Main menu:

Site search

Categories

Archive

Know Your Engines at O’Reilly Velocity 2011

Yesterday I gave a talk at O’Reilly Velocity 2011, “Know Your Engines: How to make your JavaScript fast”. The concept came from the common idea that C++ programmers should know something about how the compiler works on their code in order to know how to get good performance. Why not for JS? So I basically offered what I know as a performance-focused engine developer: how the engine works in general, how it tries to make code fast, and what kind of code will defeat engine optimizations and make things slow. I figure it was pretty heavy technical stuff (including plenty of computer science) by the usual standards, but it seems to have gone over pretty well, judging by the tweets, and hopefully it will help developers make faster and more capable JavaScript apps.

I covered the full range of engines, from the interpreter-only designs of a few years ago (as background), to the function-by-function JITs currently in the latest version of every major browser (e.g., JägerMonkey), the current and possible future type-specializing JITs (Tracemonkey and Crankshaft, with brief mention of David Anderson‘s very exciting IonMonkey project), and a little bit on Brian Hackett’s type inference branch. (I heard afterward that Microsoft’s Chakra team is working on type inference as well, but a quick search didn’t turn up any real info–if anyone knows more, I’m curious.) The high bit is that if you write JavaScript that could have Java-oid type declarations applied to it, current and future engines will for the most part be able to magically figure out what those declarations would be, and make your code much faster. There’s a lot more in there about things like what a type really is in the engine, how property lookups are optimized, and what kinds of properties are fastest.

GC pauses are a major concern for interactive apps and games, so I had a section on how GCs work, what’s coming from engine developers to control pauses, and what can be done for now. I had to go kind of quickly through that section, so I want to add a few more notes here:

Bill McCloskey has a prototype of incremental GC for Firefox. It can slice the mark phase up finely enough to keep pause times below 1 ms. It doesn’t directly address sweeping, but we do most sweeping on a background thread, anyway (and sweeping can also be interleaved with allocations in order to keep pauses down). Currently the write barrier slows things down too much for production, so he’s still checking that out.

Recommendations for avoiding GC pauses depend on what kind of GC the browser has. For now, if you want to reduce pauses on all browsers, you have to assume a simple stop-the-world mark-sweep collector, in which case there are 2 key things to know. First, the length of pauses is roughly proportional to the total size of reachable heap objects (which include JS objects, arrays, strings, and functions), because the mark phase has to traverse them all. Second, the frequency of pauses depends on the allocation rate, that is, the number of bytes allocated per second, because using up memory is what triggers the GC to run.

A simple model of a program running under generational GC is that it starts up, allocates a bunch of objects, and then settles down into a steady state. In the steady state, there is a working set of W (bytes) allocated memory that stays allocated, and then a churn rate R (bytes/sec), where every second R bytes of temporary objects are allocated (e.g., short-term closures, strings being concatenated into results). R determines how often pauses occur, and W how long the pauses are. So, you could entirely avoid pauses in practice by having either W ≈ 0 (so pauses are unnoticeable) or R ≈ 0 (so pauses don’t happen in the steady state, only during startup).

I finished the talk with some gritty practical bits I discovered by running a whole bunch of microbenchmarks, based on educated guesses about what would turn up performance faults. Probably the most important is that eval, with, and throwing exceptions are not only generally slow operations, but they can deoptimize nearby code and cause much greater slowdowns than you might expect. Another big one is that dense arrays are much faster than sparse arrays–see the slides for details.

My slides are online at the O’Reilly page for my talk. I also have my own copy of the original Keynote version, and a pdf. Demis Bellot has also conveniently put them on SlideShare.

Thanks to everyone who tweeted praise for/interest in my talk (that I found in my mentions at least): piscisaureus, kamidphish, colinbdclark, rednaxelafx, djco, demisbellot, porteneuve, brendankenny, paulbiggar, slicknet, demisbellot, alex_gaynor, devongovett, juanpin, derek, SkipSauls, and of course Brendan. I truly appreciate your letting me know you enjoyed the talk (or expect to enjoy it!). And thanks again to Steve Souders and the Velocity conference chairs for hosting me and making it all possible!

Comments

Comment from skierpage
Time: June 16, 2011, 6:05 pm

Great slides, thank you.
“The key to running faster in future JITs is
type-stable JavaScript.”

So put optional types into JavaScript 6! Help me avoid coding errors while making execution faster, let me say size is an integer and htmlOut is a string. Just as I can say const a = 7. What says Mr. Eich ?

Does using JavaScript typed arrays, e.g. putting all my integer variables into Uint16Array, make generated code faster? Will the proposed StructType?

Comment from Brendan Eich
Time: June 16, 2011, 9:19 pm

skierpage: ES4 is dead, get over it. :-|

Optional runtime “guards” are proposed for Harmony, but as defined they probably just slow your code down. JS does arithmetic in double, so annotating int and requiring wraparound is slower than allowing for possible overflow to double (AS3 users found this out the hard way).

Also, it’s not up to me alone what browser vendors and others on Ecma TC39 agree on. SpiderMonkey does not implement anything not proposed as a strawman, if not approved for Harmony.

/be

Pingback from David Mandelin's blog » Know Your Engines at O'Reilly Velocity 2011 | Internet blog
Time: June 16, 2011, 11:40 pm

[...] link: David Mandelin's blog » Know Your Engines at O'Reilly Velocity 2011 Bookmark to: This entry was posted in Uncategorized and tagged mandelin. Bookmark the permalink. [...]

Comment from David Bruant
Time: June 17, 2011, 4:14 am

This presentation is sooooooooooooooooo gooooooooooooooooooood!
A lot of things I wanted to learn about SpiderMonkey (and JS engines in general) is here.

Thanks a lot!

Ps: Still no intention to work on parallel GC to avoid pauses entirely?

Pingback from Sit back and watch; Great talks on JavaScript and the Web | FunctionSource Development
Time: June 17, 2011, 8:19 am

[...] Mandelin of Mozilla gave a great talk at Velocity this week sharing how JavaScript engines work. You always have to be careful about what to do with information on internals, as you don’t [...]

Comment from dmandelin
Time: June 17, 2011, 12:53 pm

I think I briefly mentioned typed arrays in my talk.

In theory, I think typed arrays should usually help. They help because for ints, the array can be smaller, giving better cache locality (for large arrays); type tags don’t have to be read and written; and because if you are going to do arithmetic on the result, you don’t need to check the type tag for the element you read. (You do need to check the type tag on the typed array itself, but I imagine for many programs that could be hoisted above hot loops.) With typed arrays you may need a type check to store into the array, would could slightly hurt in some cases. (But with floating-point usages, there usually shouldn’t be much checking, since overflow doesn’t come into play.)

The main problem is that since typed arrays are newish and not used in the popular benchmarks, browsers don’t always have full IC support, or maximally efficient ICs there–there are a lot of different types, and some tricky issues come up, e.g., with Firefox 4’s nan-boxing, you need to check for non-canonical nans coming out.

For now, you just have to try it out. I think some of Alon’s Enscriptem (http://mozakai.blogspot.com/search/label/emscripten) programs get good speedups with typed arrays.

Comment from Craig McPheat
Time: June 17, 2011, 2:56 pm

Thanks for the slides :)

Comment from Laura
Time: June 18, 2011, 11:32 am

Car Rentals Costa Rica

Comment from AndersH
Time: June 18, 2011, 1:45 pm

When type inference is implemented I hope the browser will include a way to view (or an api for an addon such as firebug to to view) which type could be inferred, such that webdevs can check if the browser inferres variables in any critical code as expected.

Comment from Nicolas
Time: June 19, 2011, 10:43 pm

Following the thought of AndersH, is there a way to “get” the type inference part of the engine and use it as a syntax/code checker for performance optimizations in code in javascript for some IDE/editor? Can the syntax checker tell the reason why it can’t optimize the code? (different shapes, etc).

Comment from Andy Wingo
Time: June 20, 2011, 6:52 am

Nice slides.

One note though, the speed at which a closure’s free variables are accessed need not depend on their nesting depth. The set of closure variables needed by a procedure is cheap to compute at compile-time. When the closure is made, those variables may be copied into a vector associated with the closure, and thereafter accessed by index.

At that point for free variables that are never mutated, you’re done. For free variables that are set, you can still copy the variable into a vector, but you need to allocate the variable in a “box”, and rewrite accesses to go through that box.

Another advantage of this technique is that a closure only holds onto the values that it needs. Any other lexical value is free to be collected.

Cheers,

Andy

Comment from dmandelin
Time: June 20, 2011, 9:00 am

@AndersH @Nicolas One thing to keep in mind is that the type inference is not “sound”. That is, it will often compute a set of possible types that is smaller than the true set of possible types. This is OK (and in fact good for optimization)–there are checks that detect when the type inference has been overoptimistic, and then it revises its results. But this may or may not be so good for use in program understanding.

Tools to display optimization hints along with source code are a great idea–Steve Fink recently checked in a facility for telling which backend the code runs in, although we don’t have tools yet.

Comment from Brendan Eich
Time: June 20, 2011, 9:41 am

@Andy: SpiderMonkey does the Chez Scheme “display closure” optimization, called “flat closures” in our code, but only where the closure postdominates the upvar’s initialization. We don’t do assignment conversion (heap-boxing mutable upvars).

Your point is good in general: absent eval (which can inject shadowing vars), and of course ‘with’, JS has static scope, so the cost of accessing an upvar should not depend on its static level difference from the static level of the closure referencing it.

Indeed the same shape-based PIC techniques work with “Call” objects (reified activation objects) in SpiderMonkey. Our approach does not require checking each shape along the path from upvar use to definition, although I’m not sure that the method JIT takes advantage of this yet.

Regards,

/be

Comment from Geoffrey Sneddon
Time: June 21, 2011, 1:08 pm

Few notes on Carakan (almost all from the first two blog posts on it):

Carakan has a run-time type-specializing JIT (always has, since it first shipped in 10.50 15 months ago). It doesn’t however have a JIT that doesn’t do that (stuff runs in interpreter before that, and anything slow-cased goes to the interpreter, though of course slow-case too often and we’ll recompile).

We also should have ICs that handle properties not present, too, and rudimentary benchmarking tends to confirm this. (It’s 50% quicker than a property access.)

Comment from s
Time: June 21, 2011, 2:19 pm

Question on the whole IE JS engine vs JägerMonkey/IonMonkey…
How do you guys know that MS is not copying your code (or at least ideas) and incorporating them into IE… considering that FF is open source vs IE closed source?
Just curious.

Pingback from New JavaScript Engine Module Owner | Brendan Eich
Time: June 21, 2011, 6:30 pm

[...] know the rest: JS performance has grown an order of magnitude over the last several years. Indeed, JS still has upside undreamed of in the Java world where 1% [...]

Comment from Brendan Eich
Time: June 22, 2011, 12:09 am

@s We hope they are copying! No problem with that, it’s part of the open source package deal.

Alas I believe Microsoft has too many lawyers, so their engineers are reported to be forbidden to read open source.

/be

Comment from dmandelin
Time: June 22, 2011, 8:04 am

Thanks for the additional notes on Carakan. I’m curious about the input to the type specializer–there’s a huge design space there–tracing, profiling, local static type inference, whole program inference, and all the various hybrids.