Category Archives: Mozilla

asm.js in Firefox Nightly

I’m happy to announce that OdinMonkey, an asm.js optimization module for Firefox’s JavaScript engine, is now in Nightly builds and will ship with Firefox 22 in June.

What is asm.js? Why are we doing it, and how are we getting to within 2x of native performance? This post won’t be able to go into too much detail since we’re hard at work preparing for Mozilla’s upcoming GDC session, which you should definitely come see (Wednesday 11am, Room 3024, West Hall). After GDC, expect full coverage of these topics by Alon, Dave, myself and surely others. For now, allow me to point you at the asm.js FAQ, Alon’s mloc.js slides, a nice Badass JavaScript post and a more in-depth post by Axel Rauschmayer.

Want to see it in action? Download a new Firefox Nightly build and try out BananaBench. (Note: BananaBench runs with a fixed time step to make JS execution deterministic, so game speed will run fast/slow, depending on your hardware/browser.) Or, check out a demo of the Emscripten-compiled Bullet engine simulating a ton of falling boxes.

At the moment, we have x86/x64 support on desktop Windows/Mac/Linux and support for mobile Firefox on ARM is almost done. Since we intend to continue to iterate on the asm.js spec in cooperation with other JS engines, we’ve put OdinMonkey behind the flag javascript.options.asmjs in about:config. This flag is currently enabled by default on Nightly and Aurora, and if nothing changes over the next 12 weeks, will be automatically disabled in Beta and Release. By then, we hope to be happy with a stable “asm.js v.1″, we’ll enable it everywhere and ship with it enabled in our final builds. [Update: OdinMonkey has been enabled by default for all releases starting with Firefox 22.]

If you want to start experimenting with asm.js right now, you can:

  • Get Emscripten and start compiling C/C++ code. (Don’t forget the -O2 -s ASM_JS=1.)
  • Check out the draft spec and start writing asm.js by hand.

In the future, we’d like to see a rich third option of generating asm.js using a more ergonomic front-end language (e.g., a derivative of LLJS). [Update: LLJS work is already underway!]

How do you know if you are generating valid asm.js and taking full advantage of OdinMonkey? In the old days, this was a frustrating question for developers. Maybe you were doing something wrong, maybe the code, as written, was just slow. One cool thing about asm.js is that the "use asm" directive makes the programmer’s intention quite clear: they want to compile asm.js. Thus, if there is an asm.js validation error, OdinMonkey will print a warning on the JavaScript console. (OdinMonkey emits a warning, instead of throwing an error, since asm.js is just JavaScript and thus cannot change JavaScript semantics.) In fact, since silence is ambiguous, OdinMonkey also prints a message on successful compilation of asm.js. (There is currently a bug preventing asm.js optimization and warnings in Scratchpad and the Web Console, so for now experiment in regular content.)

For those who are itching to do some performance experiments: go for it, we’ve been pretty happy with the results so far when asm.js is applied to new codes, but we’ve also seen plenty of cases where the C++ compiler is doing important backend optimizations that we haven’t taught our IonMonkey backend yet. We expect continuous incremental improvement as we measure and implement new optimizations to close this gap. Second: one performance fault that we already know trips up people trying to benchmark asm.js is that calling from non-asm.js into asm.js and vice versa is much slower than normal calls due to general-purpose enter/exit routines. We plan to fix this in the next few months but, in the meantime, for benchmarking purposes, try to keep the whole computation happening inside a single asm.js module, not calling in and out.

In closing, I leave you with the musical inspiration for OdinMonkey:

Happy hacking!

Optimizing JavaScript variable access

Nicolas Pierron has provided a French translation of this post. Thanks!

I recently finished a project to improve how SpiderMonkey implements variable access so I thought this would be a good time to explain how it all works now. Taking a note from mraleph’s post (and SICP), I’ll illustrate the implementation using JavaScript as the implementation language. That is, I’ll translate JavaScript using full-featured variable access into JavaScript that doesn’t, rather like how the original C++ compiler translated C++ into C.

Before starting, let me set up the problem space. By variable I’m referring not just to the variables introduced by var, but also those introduced by let, const, catch, function statements, and function argument lists. By variable access, I mean a read or a write. Variable access can take many forms:

  • Local access (i.e., access to a variable in the same function):
    function add(x,y) { return x+y }
  • Non-local access (i.e., access to a variable in an enclosing function):
    function add(x,y) { return (function() { return x+y })() }
  • Access from dynamically-generated code:
    function add(x,y) { return eval("x+y") }
  • Access after dynamic scope modification via non-strict direct eval:
    function add(a,b) { eval("var x="+a+", y="+b); return x+y }
  • Dynamic function argument access via the arguments object:
    function add(x,y) { return arguments[0]+arguments[1] }
  • Unexpected debugger snooping (via Firebug, the new builtin Firefox debugger, or directly from privileged JS using the new Debugger API):
    dbg.onDebugerStatement = function(f) { return f.eval("x+y") }

To keep the post small(-ish), I’ll pretend there is only (non-strict, direct) eval and ignore strict and indirect eval as well as with (which we generally deoptimize as if it was an eval). I’ll also ignore let, global access optimizations, the bizarre things SpiderMonkey does for block-level function statements, and the debugger.

The worst case

To rise above, we must first see how low we need to go in the worst case. Consider the following function:

function strange() {
  eval("var x = 42");
  return function xPlus1() { var z = x + 1; return z }
}

Here, eval is dynamically adding x to the scope of strange where it will be read by xPlus1. Since eval can be called with a dynamically-constructed string we must, in general, treat function scopes as dynamic maps from names to values. (Fun fact: names added by eval can be removed using the delete keyword, so the map can both grow and shrink at runtime!)

To make this more concrete, we’ll implement scopes in JS using ES6 Map objects. We’ll give every function its own Map that will be stored in a local variable named scope and hold all the function’s variables. (Yes, we’re using a variable to implement variables; but since we’ll only use a small finite number of them, we can think of them as registers.)

function strange() {
  // the scope of 'strange' is initially empty
  var scope = new Map;

  // eval("var x = 42") effectively executes:
  scope.set('x', 42);

  return function xPlus1() {
    // vars are hoisted so scope initially contains 'z'
    var scope = new Map([['z', undefined]]);

    // var z = x + 1
    scope.set('z', scope.get('x') + 1);  // oops!

    // return z
    return scope.get('z');
  }
}

As the comment indicates, there is a bug in xPlus1: x isn’t in the scope of xPlus1, it’s in the scope of strange! To fix this we need to do two things:

  1. Add an enclosing field to all scope objects indicating the enclosing function’s scope (or the global object if the function is top-level).
  2. Replace uses of scope.get with a lookup algorithm that walks the chain of scopes.
function strange() {
  // the scope of 'strange' is initially empty
  var scope = new Map;
  scope.enclosing = window;

  // eval("var x = 42") effectively executes
  scope.set('x', 42);

  var tmp = function xPlus1() {
    // vars are hoisted so scope initially contains 'z'
    var scope = new Map([['z', undefined]]);
    scope.enclosing = xPlus1.enclosing;

    // var z = x + 1
    scope.set('z', lookup(scope, 'x') + 1);

    // return z
    return lookup(scope, 'z');
  }
  tmp.enclosing = scope;
  return tmp;
}

function lookup(scope, name) { while (scope instanceof Map && !scope.has(name)) scope = scope.enclosing; return scope.get(name); }

Note that, without being able to use non-local variable access (since that is what we are implementing), we must attach the scope of strange to the xPlus1 function object. This isn’t just some hack; it is a fundamental part of the implementation of languages with lexically-scoped first-class functions. More generally, we can establish the following relationship (pardon my ASCII-art):

Function-scope
  | *        ^ 0 or 1
  |          |
  | call of  | enclosing
  |          |
  V 1        | 1
Function-object
  | *
  |
  | evaluation of
  |
  V 1
Function-literal

Each function literal can be evaluated any number of times, with each evaluation producing a function object that is associated with its enclosing scope. Each of those function objects can be called any number of times, each of those calls producing a scope. When using the language, it is easy to see just a single concept function, but hopefully this illustrates that there are really three “function” concepts at play here: scope, object, and literal.

With these changes, we have successfully dealt with the ravages of eval, but at what cost? Each variable access involves a call to an algorithm that iteratively performs hash-table lookups! Fortunately, this problem isn’t that different from object-property lookup and the same type of optimizations apply: hidden classes and caches. I won’t go into these techniques, as there are already two great explanations available. (Caching has been used to speed up name access since Firefox 3.) Even with these optimizations, however, name lookup isn’t as fast as we’d like it to be and we are still creating a Map object on every call.

In summary, we’ve handled the worst case, but we’d like to do better in code that doesn’t exercise the worst case.

Fast local name access

Now let’s optimize local variable access when all accesses are local. With this constraint, JavaScript starts to look like C and we can use some of the same techniques as a C compiler: store all variables in a stack and access variables by their offset in the stack.

As a first (highly garbalicious) iteration, we create an array for each set of arguments and vars, thereby turning

foo(13, 42);

function foo(x,y) {
  var a = x + y;
  return bar(a);
}

into:

foo([13, 42]);

function foo(args) {
  var vars = [undefined];
  vars[0] = args[0] + args[1];
  return bar([vars[0]]);
}

The second step is to avoid creating all those temporary arrays by using one big array, shared by all active functions. There are many ways to do this (corresponding to different calling conventions); we’ll just do something simple here:

// executed some time before the first function call:
var stack = [];

stack.push(13);
stack.push(42);
foo(/* number of arguments pushed = */ 2);

function foo(numArgs) {
  // push missing arguments, pop extra arguments
  for (var i = numArgs; i < 2; i++)
    stack.push(undefined);
  for (var i = numArgs; i > 2; i--)
    stack.pop();

  // analogous to the frame pointer register
  var firstLocal = stack.length;

  // push local 'a'
  stack.push(undefined);

  // var a = x + y:
  stack[firstLocal] = stack[firstLocal - 2] + stack[firstLocal - 1];

  // prepare stack for call to 'bar(a)':
  stack.push(stack[firstLocal]);
  return bar(/* number of arguments pushed = */ 1);

  // in this calling convention, the callee pops the arguments
  stack.pop(); // pop 'a'
  stack.pop(); // pop 'y'
  stack.pop(); // pop 'x'
}

With this strategy, a JIT compiler can do some pretty great optimization. To start with, each read from or write to stack in the above JS can be compiled down to a single CPU load or store. This is achieved by caching the address of stack[firstLocal] in a register and rolling the remaining “+ INDEX” into the load instruction as an offset. Even better, modern JavaScript JIT compilers do register allocation which can avoid the loads/stores altogether. (Register allocation has been in Firefox since version 3.5.)

In summary, we can do pretty efficient things for local variable access, but only with some stringent restrictions.

Fast non-local access

While we shouldn’t expect great performance when functions call eval or arguments, the requirement made in the previous section that we only access local variables is pretty harsh and conflicts with both the functional and module patterns of JavaScript programming. In this section, we’ll optimize non-local access.

We start with the observation that, in the absence of eval and other weirdos, there is no need for a fully dynamic scope lookup: we can know exactly where on the scope chain to find the variable being accessed. The first step is to view each top-level function as a tree of nested functions, giving each node (function) in the tree an array of the variables defined in its scope. For example, given this function:

function add3(arg1, arg2, arg3) {
  function addInner(innerArg1) {
    function innermost() { return innerArg1 + arg2 + getArg3() };
    return innermost();
  }
  function getArg3() {
    return arg3;
  }
  return addInner(arg1);
}

we can distill the following tree:

function add3: [arg1, arg2, arg3, addInner, getArg3]
 |\_ function addInner: [innerArg1, innermost]
 |    \_ function innermost: []
  \_ function getArg3: []

The next step is to include uses as leaves of the tree that are linked to the innermost enclosing definition with the same name. Rather than drawing terrible ASCII-art arrows, let’s represent a use-to-definition arrow with a two-number coordinate:

  • hops = the number of nodes in the tree to skip to get to the function node whose array contains the definition.
  • index = the index of the definition in the function node’s array.

Linking uses to definitions in the above tree produces:

function add3: [arg1, arg2, arg3, addInner, getArg3]
 |\_ function addInner: [innerArg1, innermost]
 |    |\_ function innermost: []
 |    |    |\_ "innerArg1"   {hops=1, index=0}
 |    |    |\_ "arg2"        {hops=2, index=1}
 |    |     \_ "getArg3"     {hops=2, index=4}
 |     \_ "innermost":       {hops=0, index=1}
 |\_ function getArg3: []
 |     \_ "arg3"             {hops=1, index=2}
 |\_ "addInner"              {hops=0, index=3}
 |\_ "getArg3"               {hops=0, index=4}
  \_ "arg1"                  {hops=0, index=0}

As a last step, we’ll erase all variables that only have local uses. We can also remove entire scopes if they are empty; we just need to be mindful not to include these removed scopes in any hops count. Applying this last transformation produces the following, final tree:

function add3: [arg2, arg3, getArg3]
 |\_ function addInner: [innerArg1]
 |     \_ function innermost: 
 |         |\_ "innerArg1"   {hops=0, index=0}
 |         |\_ "arg2"        {hops=1, index=0}
 |          \_ "getArg3"     {hops=1, index=2}
 |\_ function getArg3: 
 |     \_ "arg3"             {hops=0, index=1}
  \_ "getArg3"               {hops=0, index=2}

With this analysis, we have all the information we need to efficiently compile the program. For the local-only variables that we removed in the last step, we can use the stack directly (as in the second section). For variables with non-local access, we can represent the scope chain as a linked list of scopes (as in the first section), except this time we represent scopes as arrays instead of maps. To compile an access, we use {hops,index} coordinate: hops tells us how many .enclosing links to follow, index tells us the index in the array.

Applying this scheme to the original example (and eliding the missing/extra arguments boilerplate) produces the following translated JS (with the scope access code highlighted in red):

function add3() {
  var firstLocal = arguments.length;

  // the optimized scope of add3 is: [arg2, arg3, getArg3]
  var scope = [stack[firstLocal-2], stack[firstLocal-1], undefined];
  scope.enclosing = window;

  // initialize 'addInner':
  stack.push(function addInner() {
    var firstLocal = arguments.length;

    // the optimized scope of addInner is: [innerArg1]
    var scope = [stack[firstLocal - 1]];
    scope.enclosing = addInner.enclosing;

    // push local 'innermost'
    stack.push(function innermost() {
      // the scope of innermost is completely optimized away
      var scope = innermost.enclosing;

      // return innerArg1 {hops=0, index=0} +
      //        arg2      {hops=1, index=0} +
      //        getArg3() {hops=1, index=2}
      return scope[0] +
             scope.enclosing[0] +
             (scope.enclosing[2])();
    });
    stack[firstLocal].enclosing = scope;

    // return innermost()
    var returnValue = (stack[firstLocal])();
    stack.pop();  // pop 'innermost'
    stack.pop();  // pop 'innerArg1'
    return returnValue;
  });
  stack[firstLocal].enclosing = scope;

  // initialize 'getArg3' {hops=0, index=2}:
  scope[2] = function getArg3() {
    // the scope of getArg3 is completely optimized away
    var scope = getArg3.enclosing;

    // return arg3 {hops=0, index=1}
    return scope[1];
  }
  scope[2].enclosing = scope;

  // return addInner(arg1)
  stack.push(stack[firstLocal - 3]);
  var returnValue = (stack[firstLocal])();
  stack.pop();  // pop 'addInner'
  stack.pop();  // pop 'arg3'
  stack.pop();  // pop 'arg2'
  stack.pop();  // pop 'arg1'
  return returnValue;
}

JS performance experts will point out that putting a named property on an array triggers a deoptimization in some JS engines (including, until bug 586842 lands, SpiderMonkey). Let’s pretend it doesn’t; after all, we could just reserve scope[0] as the enclosing link.

This strategy is good for JIT compilation in several ways:

  • If a variable is only accessed locally, it can still live on the stack and receive full JIT optimization.
  • Each .enclosing expression compiles to a single load instruction. Furthermore, when there are multiple accesses to variables in the same scope, the compiler can factor out the common scope walking.
  • Since a non-local name access in this scheme is much simpler than the name cache mentioned earlier, IonMonkey is more able to apply the optimizations it uses for local names such as LICM, GVN, and DCE.

In summary, we’ve now optimized non-local access while keeping local access fast. There are several other optimizations related to scopes that soften the blow when eval or arguments is used, but I think this is a good stopping point.

Next steps

The recent scope project basically catches us up to the level of other JS VMs. I should also note that functional languages have been doing similar optimizations forever. Looking forward, there are some straightforward optimizations I think we could do to avoid creating scope objects as well as more advanced optimizations we can lift from the functional crowd.

In SpiderMonkey

If you are interested in seeing the code for all this in SpiderMonkey, you can use the following links to get started:

  • The {hops,index} coordinate is called ScopeCoordinate.
  • The various scope objects are described in this ASCII-art tree. (Note, for mostly historical reasons, we use the same underlying representation for objects and scopes. Due to the Shape mechanism (which is pre-generated for scopes at compile-time), scopes are still, effectively, arrays.)
  • Optimized non-local access is performed with ALIASEDVAR opcodes. See the implementation of these ops in the interpreter and IonMonkey jit.
  • The frontend name analysis is a bit old and messy (and will hopefully be rewritten sometime in the near future). However, the important part of the analysis is at the very end, when we emit the ALIASEDVAR ops in EmitAliasedVarOp.

JSRuntime is now officially single-threaded

Given this title, a reasonable reaction would be:

Wait, wait, single threaded?!  But isn’t that, like, the wrong direction for the multicore present and manycore future?

so let me start by clearing this up:

A single SpiderMonkey runtime (that is, instance of JSRuntime) — and all the objects, strings and contexts associated with it — may only be accessed by a single thread at any given time. However, a SpiderMonkey embedding may create multiple runtimes in the same process (each of which may be accessed by a different thread).

That means it is up to the embedding to provide communication (if any) between the runtimes via JSNative or other SpiderMonkey hooks. One working example is the new implementation of web workers in Firefox which uses a runtime per worker. Niko Matsakis is experimenting with a different architecture in his new parallel JS project.

So that’s the quick summary. Now, for the interested, I’ll back up and explain the situation, how we got here, and where we are going in more detail.

Ghosts of SpiderMonkey past

In the beginning, as Brendan explains, Java-style big-shared-mutable-heap concurrency was all the rage and so, as Java’s kid brother, SpiderMonkey also had big-shared-mutable-heap concurrency. Now, locks weren’t ever (afaik) exposed to JS as part of SpiderMonkey, but an embedding could add them easily with a JSNative. However, SpiderMonkey did support concurrent atomic operations on objects with a clever (patented, even) locking scheme that avoided synchronization overhead for most operations.

This initial threading design stayed in place until about a year before Firefox 4.0 when the compartments project picked up steam. The key new concept introduced by this project was, well, the compartment. A runtime contains a set of compartments and each compartment contains a set of objects. Every object is in exactly one compartment and any reference between objects in different compartments must go through a wrapper. With compartments and wrappers, you can implement a sort of membrane that is useful for all kinds of things: GC, security boundaries, JIT compilation invariants, memory accounting, and JS proxies. Overall, I would say that compartments are one honking great idea.

The important thing about compartments for this story, though, is that the implementation effort really wanted single-threaded access to everything in a compartment. To be honest, I don’t know the particular technical issue raised at the time, but it isn’t hard to see how single-threaded-ness was a necessary simplification for such a challenging plan (viz., shipping compartments with Firefox 4). Anyway, the decision was made and compartments became single-threaded.

After Firefox 4 another great choice was made to rewrite Firefox’s implementation of web workers to not use XPConnect and to instead create a new runtime per worker. The choice was made because, even though a runtime allowed multi-threaded execution, there were still some global bottlenecks such as GC and allocation that were killing workers’ parallelism.

I’m Talking About Drawing a Line in the Sand

With web workers in separate runtimes, there were no significant multi-threaded runtime uses remaining. Furthermore, to achieve single-threaded compartments, the platform features that allowed JS to easily ship a closure off to another thread had been removed since closures fundamentally carry with them a reference to their original enclosing scope. Even non-Mozilla SpiderMonkey embeddings had reportedly experienced problems that pushed them toward a similar shared-nothing design. Thus, there was little reason to maintain the non-trivial complexity caused by multi-threading support.

There are a lot of things that “would be nice” but what pushed us over the edge is that a single-threaded runtime allows us to hoist a lot data currently stored per-compartment into the runtime. This provides immediate memory savings and also enables another big change we want to make that would create a lot more compartments (and thus needs compartments to be lighter-weight).

Thus, the decision was made to try to make SpiderMonkey single-threaded as an API requirement. A bug was filed in April 2011 and an announcement was made on dev.tech.js-engine a month after.

Across this line you do not…

April 2011 to… January 2012… what took so long?

Well, to begin with, there were quite a few minor uses of JSRuntime off the main thread that had to be chased down. Also, each of these cases required understanding new parts of the codebase and, in several cases, waiting a few months for other kind-hearted, but very busy, Mozillians to fix things for me. The biggest problem was xpcom proxies (not to be confused with JS proxies, which are awesome). Fortunately, Benjamin Smedberg already had a beef with xpcom/proxy and (just recently) nuked the whole directory from orbit.

After getting try server to pass without hitting any of the 10,000 places where single-thread-ness gets verified in debug builds, we couldn’t exactly just rip out the multi-threading support. The worry we had was that some popular add-ons would break the whole browser and we’d be faced with an uncomfortable backout situation. Thus, we landed a simple patch that asserts single-threaded-ness in a few pinch points in release builds and waited for the assert to make its way to a bigger audience. (I think this is a great example of how the rapid-release process enables developers.)

As of right now, the assert is in Firefox 10 Beta and slated to be released on January 31st. There are three four known offending extensions:

  • The IcedTea Java plugin on Linux seems to hit the assert for some applets. [Update: this was reported fixed in version 1.2pre]
  • BExternal.dll and gemgecko.dll are touching the main-thread only pref service off the main thread (already a bug) which ends up calling a JS observer. [Update: Both Gemius and Babylon seem to have shipped fixes]
  • [UPDATE] The DivX plugin.

Based on these results, we are concluding that the invariant “stuck” and thus we can actually make changes that assume single-threaded-ness. Indeed, the first bomb has been dropped (taking along 2200 lines of code along with it, and this is just the beginning).

The single-threaded invariant in detail

Each runtime now has an “owner thread” (JSRuntime::ownerThread). The “owner thread” is the id of the only OS thread allowed to touch the runtime. This owner thread is set to the current thread (PR_GetCurrentThread()) from JS_NewRuntime and may only be changed — when no JS is executing in the runtime — via JS_ClearRuntimeThread/JS_SetRuntimeThread. Virtually every JSAPI function that takes a JSContext parameter will assert that PR_GetCurrentThread == cx->runtime->ownerThread().

It should be mentioned that there are still a few remaining sources of concurrency:

  • The background-sweeping thread cleans up garbage objects which don’t have finalizers. Its interaction with the VM is pretty well isolated to the GC.
  • Pretty much the only JSAPI function that can be called off the main thread is JS_TriggerOperationCallback. This is how the the watchdog thread stops runaway JS. Fortunately, the interaction with the VM is through a single field: JSRuntime::interrupt.

One last thing to point out is that SpiderMonkey’s architecture will likely continue evolving to meet new concurrency needs. Indeed, concurrent runtimes may one day return in a more restricted and structured form. But maybe not; we’ll see.

Ubuntu+GNOME Shell: open-source ecosystem win

I recently upgraded to Ubuntu 11.10.  That means I finally had to bite the bullet and figure out what I wanted to do about this whole new fancy-shell business; do I reject these encroachments on my established workflow or embrace the new hotness?  Trying not to be too much of a Luddite, I decided to make the leap to the new hotness.

But where to leap to?  Unity or GNOME Shell?  I expected that this would be some sort of big decision that determined what apps I could run and would take a lot of time if I wanted to try it out. But no, getting both Unity and GNOME Shell on the same desktop was as easy as sudo apt get install gnome-shell from a vanilla upgrade of Ubuntu. Toying around with each was as simple as choosing from the drop-down list in the login manager. Everything just worked.

Now, maybe you’re all like “Duh, that’s what it should be; that you would be impressed by this shows how broken the system is and how your thinking is warped. Eyhhh”. Yeah, well I’m like whatever. Consider all the things that had to come together to make this possible. You need the separation of window manager, desktop environment, an applications; you need a packaging discipline that let’s all this coexist without clobbering each other; you need software broken into nice little pieces; you need Canonical making good choices, etc.

In the end (after a few hours of experimenting with my workflow) I prefer GNOME Shell (to Unity but also to GNOME 2). In theory, I should switch to Fedora since it uses GNOME Shell, but I’ve been really happy with Ubuntu as a whole. Thus, I’m sticking with a Ubuntu+GNOME Shell hybrid and I think it’s awesome that I have that choice. Not strictly, but this seems like the type of thing that can only happen in the distributed-development model open source software. There are still all sorts of problems to be solved for the Linux desktop, but things like this give me hope for the long-term outlook of the desktop and of the broader open-source ecosystem.

Boot To Gecko misconceptions

I’m all jazz hands about Boot To Gecko (B2G). I think B2G is really important to the Mozilla mission.  Perhaps stemming from the early-and-open nature of B2G, there are some misconceptions about B2G that I’ve seen in articles and forums. I am not closely involved in the project, but I do know enough to identify and correct a few of these misconceptions with the following three B2G facts:

  1. B2G will not run in kernel mode.  To be clear, B2G will run on top of the Linux kernel; Gecko will run as user-mode processes.  Furthermore, a crash in Gecko will not take down the entire phone: with Electrolysis (already being used in Firefox Mobile), different apps/sites will run in different processes.
  2. B2G will (ultimately) not run on top of Android.  To bootstrap the project, work is currently being done on top of Android.  However, the goal is to incrementally remove each dependency on Android, leaving only drivers and low-level libraries.  In particular, this means B2G would not contain the Dalvik Java VM which should significantly improve the patent-encumbered Java situation as well as reduce the number of VMs needed to browse the web from 2 to 1.
  3. B2G will use Gecko, but it’s not just about Gecko.  A clearer name might have been “Boot to Web platform”.  Gecko will, of course, be the engine used to prototype new Web APIs but since these are targeted at open standards developed in the open (as opposed to dumped in the open), a possible/desirable outcome is a separate “Boot To Webkit” implementation able to run the same home screen and apps as B2G.

If you are excited, feel free to contribute to the project; it’s just starting and there are many important problems to be solved.

Old Dijkstra Essays Considered

Sometime when I was an undergraduate, I came across a news article announcing that all the EWDs has been published online.  After figuring out what EWD meant (EWD are Dijkstra’s initials; he used “EWD” followed by a number to index his essays), I started reading a few and got hooked.  For having an ivory-tower sort of reputation (stemming from opinions on formal verification and how computer science should be taught), it was surprising how personable, humble and clear his writing is.  Many of the essays are handwritten and even his handwriting style is delightful (someone even turned it into a font).

For fun, I recently decided to re-read some of the outstanding EWDs to see how they sounded to me now; for the last few years I’ve been hacking on the Mozilla JS engine (SpiderMonkey) so presumably my perspective on software development has changed.  To my surprise, instead of finding the essays milder than remembered (as happens with movies that used to be terrifying (Critters) or incredibly awesome (Dino-Riders)), I found some new things relevant to what is happening now in SpiderMonkey.  That’s what I’d like to share in this post.

One of, if not the most, famous EWD is “Notes on Structured Programming” (NOSP).  This essay is the expansion of the famous “Go To Statement Considered Harmful” CACM article that started the “X considered harmful” and, more generally, “X considered Y” memes.  Primarily, NOSP makes a well-reasoned argument for why it is better to use structured control flow constructs (like if/then/else and loops) than goto.  Great, we all get that — so much so that goto is now often rejected a priori despite having legitimate (albeit uncommon) applications (cf. Knuth’s “Structured programming with go to statements“).

However, NOSP’s agenda is broader than just browbeating us into not using goto.  Dijkstra frames the goto issue with some general notes on making programs simpler and in the process states two principles of program simplicity that I really like.  He doesn’t announce them as principles — they’re just sentences buried in the middle of paragraphs — but they seem like principles to me… so I’ll call them principles.  Often, judging the simplicity of a patch or piece of code feels highly subjective and more a matter of taste than anything.  However, I was delighted to see how many simplifications (considering recent SpiderMonkey patches I’ve written or seen flying by) could be viewed as following from these two principles.

Let’s start with the first principle, taken from the following quote:

In vague terms we may state the desirability that the structure of the program text reflects the structure of the computation.  Or, in other terms, ‘What can we do to shorten the conceptual gap between the static program text (spread out in “text space”) and the corresponding computations (evolving in time)?’

For the specific case of goto, Dijkstra proceeds to spell out in great detail how structured control flow admits an unequivocally shorter conceptual gap.

In SpiderMonkey, many recent simplifications have simply been removing old uses of goto (e.g., check out js_GC before a heroic set of patches by Jason Orendorff, as well as obj_eval and js_Invoke before several incremental rewritings).  I should note that switching SpiderMonkey to C++ a few years ago has been invaluable in control flow refactoring since with C++ comes RAII which has been used in practically every big function cleanup since.

But there is a lot more to this “shortening the syntax/computation gap” than control flow level refactoring.  It seems to me that trying to encode more invariants and program structure in static types falls under this category since static typing entails statements of the form “for all executions, variables of this type must have some property”.  One recent banner example is Chris Leary’s recent refactoring of JSAtomList et al into something far more typeful.  As demonstrated in his patch, C++ types often ends up simulating discriminated unions found in higher-level language (or, more generally, ADTs).  Another example of this is the type representing a JavaScript value.

Because of the, to put it kindly, dependently typed nature of many central SpiderMonkey data structures, types must often be bolted on, like an exoskeleton, to an underlying representation that is being treated like a bag of bits.  Here, C++’s unsafe casts, inlining, and templates are critical for avoiding performance penalties for using abstraction.  The data structure that holds the JS call stack is one example.  Another example is the recently-refactored set of types iteratively concocted by Gregor Wagner, Igor Bukanov, and Bill McCloskey to abstract GC data structures.

Another interesting non-standard application of types can be seen in the recent refactoring of strings.  Here, the C++ class hierarchy captures the logical hierarchy of string invariants.  Unlike an ordinary C++ class hierarchy, the string hierarchy contains no virtual functions and all instances of the hierarchy are necessarily the same size.  JSObject is incrementally growing a similar hierarchy (which I hope continues!).  In both cases, the C++ type system is being used (and sometimes abused) to provide the desired connection between static types and dynamic properties of strings/objects.

Finally, in the worst case, no language or type system will help you simplify the mapping from syntax to dynamic computation; you just need to suck it up and completely change tack.  An admirable example of this last year was Andreas Gal’s transformation of jsiter.  It is a beautiful thing when a patch, in total, removes 500 lines and SunSpider gets 4% faster.

The second principle contained in NOSP that I’d like to point out comes from the following quote:

Eventually, one of our aims is to make such well-structured programs that the intellectual effort (measured in some loose sense) needed to understand them is proportional to program length (measured in some equally loose sense).  In particular, we have to guard against an exploding appeal to enumerative reasoning, [...]

To give a bit of context, “enumerative reasoning” is defined in the essay to mean the mode of reasoning where you are forced to consider every possible case and make sure the desired property holds for each of them individually.  Dijkstra contrasts this to the more desirable “abstractive” and “inductive” reasoning and explains how building up structured control flow allows their use instead of enumerative reasoning.

One example of this in SpiderMonkey is typified by pretty much any work that Jeff Walden does.  In his patches (and, as the style spreads, others’ too), each step of the implementation is made to correspond closely to the individual steps of the corresponding ECMAScript sections, complete with comments labeling the steps.  This reduces enumerative reasoning by allowing one to judge the spec-conformance of an individual statement instead of an entire function.

Another way enumerative reasoning is reduced is by simply having less code.  This of course is Reusability, the Holy Grail of Software Engineering (apparently, a solved problem).  One of the biggest examples of this I can think of is the security wrapper rewrite that was part of the mammoth compartmentalization effort.  Allegedly, thousands of lines of mind-bending security-critical code were removed, replaced by a small set of composable policy templates.  And if that wasn’t enough, harmony:proxies got to ride along at a bargain rate!

That last class of enumerative-reasoning-reducing changes that come to mind are those that cut down on the number of VM-wide concepts — be they states, requirements, corner cases, gotchas, possibilities, etc — that must be kept in mind to effectively work on the SpiderMonkey.  Examples from recent memory include the removal of: JSScope, slow natives, watchpoint hacks, JSObjectMap, GC-from-malloc, concurrent JSRuntime access, local rooting scopes, newborn roots, display optimizations, multi-threaded objects and titles, multi-threaded strings, dormant stack frames, __count__, __parent__, non-identifier atoms, heap-doubles, null-is-an-object, the no-int-valued-doubles-in-values rule… and those are just things that appeared on my radar; a simple bugzilla search for “remove” shows a lot more.

Altogether, I think this demonstrates that these two simplification principles cover a lot of real-world patches.  It’s neat to find them nestled in a 40 year old essay written when goto still ruled the land.

On a side note, I think continual simplification is vital to maintaining a healthy, long-lived codebase.  Thus, I see the size of the — far from complete– set of simplifications listed above to be a very good sign for an already rather long-lived codebase.

I’ll conclude with the summary of a section in NOSP entitled “On Our Inability To Do Much”:

Summarizing: as a slow-witted human being I have a very small head and I had better learn to live with it and to respect my limitations and give them full credit, rather than to try to ignore them, for the latter vain effort will be punished by failure.

new mozilla::indonesia::Kumi()

Over the weekend I constructed a 3D Kumi from this 2D pattern which my two year old has been so kind as to demonstrate in a picture:

3D Kumi

If you were wondering, Kumi is a creation of the Mozilla Indonesia community.  You can’t quite see it from the photo, but Kumi is wearing the traditional dress of Balinese Kecak Dancer.  The Mozilla Indonesia community has also created a number of other variations on Kumi that feature different cultural regions.  Another fun fact that you may have already seen is that Firefox market share in Indonesia is incredibly high, somewhere around 80%!  Clearly, this is a pretty hip group :-)

A little over a month ago, I had the chance to visit Indonesia along with my wife and some of my Mountain View based coworkers (Christian Legnitto, Dave Mandelin and the Bieber-esque David Anderson) for the local Firefox 4 release parties.  It was an amazing experience and the country was really hospitable.  In particular, we had two incredible hosts.  One was Viking Karwur, a freelance web-developer in Jakarta, who I think is something like a general or commander in the Mozilla Indonesia community.  He worked with a bunch of different local groups in Indonesia to plan some really successful release parties; if you look under the “Largest Firefox Communities” header on the Firefox 4 Release Party site on meetup.com 7 of the cities are in Indonesia!  The other gracious host was Yofie Setiawan, another freelance web-developer in Jakarta.  Yofie was kind enough to take Jenn and I around the city including a market so Jenn could find some Batik cloth to take back home.

One great thing about the trip was getting to hear what aspects of Firefox mattered to the Indonesian community.  Pretty much the number one thing I heard was Firefox memory usage, so I’m glad that Nicholas Nethercote and others have started this new MemShrink push (you can see a stream of updates on Nick’s blog).  I also think this Electrolysis effort should help since closing a process should hopefully sweep away any leaked garbage associated with a page when it closes.  Also, on a practical level, I wonder how much of a free perception boost Chrome gets since its total resource usage is spread between many processes that may not all be visible to the user when they open Task Manager / Activity Monitor.

I also heard a lot of requests for Firefox Mobile on BlackBerry… nothing much positive I can say there… but also Firefox Mobile on iPhone.  Now, Apple Terms and Conditions seem to clearly lock us out of putting Firefox Mobile in the app store so I think its great how we’ve taken the positive/constructive route by creating Firefox Home.  But, and I’m totally shooting in the dark here, wouldn’t it be cool if we put real development effort behind a Firefox Mobile that ran on jail-broken iPhones?  Is that an illegal activity?  Would that cause our legit Firefox Home app to get booted?  If nothing else, it seems like it would be useful for Mozilla to rattle the gates here to point out how Apple is locking out a browser that IMHO offers a superior mobile browsing experience (its the only reason I have and continue to use my Motorola Atrix).

While I am shooting uniformed questions out into the InterWebz, another question came up while in Indonesia: this Kumi artwork that I linked to at the top of my post is a really neat fusion of Mozilla and local Indonesia culture.  Is there any good extension points in the Indonesian Firefox where Kumi could be featured?  The first thought that came to mind is in the “About Firefox” window, perhaps in the whitespace to the bottom-left of the giant official Firefox logo.  I know there is value in having a uniform experience across Firefoxen, but perhaps there is a sensible balance, or perhaps such extension points already exist.

Oh, we also got to feed monkeys!