Author Archives: Dave Herman

Automatically scheduled JS tasks—with joins!

When I spoke at the Mountain View JS meetup about upcoming features in JS the other week, I showed a slide demonstrating how you can build cooperative threads with generators, in order to do asynchronous I/O in sequence without having to chain callbacks. I got a great question about this: is it possible to do several operations concurrently but then block until they all finish? Or do you have to force every I/O operation into a sequence? In other words, can you implement joins?

So I figured it was time to make my jstask library a little more fully-featured. I’ve implemented a task scheduler, so that tasks don’t have to be manually resumed. And I’ve implemented join, so that you can chain some operations, but still run others concurrently. For example:

spawn(function() {
    var foo, bar, baz;
    yield join(spawn(function() { foo = yield read("foo.txt"); }),
               spawn(function() { bar = yield read("bar.txt"); }),
               spawn(function() { baz = yield read("baz.txt"); }));
    // ...
});

I’ve designed the library so that if a task throws an exception and another task is joined on it, the exception propagates to the joiner. So you can simply write:

spawn(function() {
    var foo, bar, baz;
    try {
        yield join(spawn(function() { foo = yield read("foo.txt"); }),
                   spawn(function() { bar = yield read("bar.txt"); }),
                   spawn(function() { baz = yield read("baz.txt"); }));
    } catch (e) {
        // ...
    }
    // ...
});

If any one of the subtasks throws an exception, the others are automatically cancelled and the exception propagates to the outer task.

This might not be the only policy that makes sense. I hope to keep experimenting with jstask, and it’s up on github and available for forking (no pun intended). Maybe we’ll discover other useful policies. This is one of the cool things about generators: they’re general and flexible enough to support multiple different libraries, policies and applications.

structs.js

I’ve written a first draft prototype of structs.js, the binary data library we’re working on for the ECMAScript standard. It’s implemented using typed arrays, but it’s my hope that it can eventually subsume the important use cases of typed arrays. (Typed arrays let you do questionable things like cast data in platform-specific ways.)

For now the implementation only works in Firefox 4, because it’s using proxies. I have ideas about how to make it work in Safari and Chrome by backing off from proxies if they’re not available, and maybe even IE by shimming typed arrays using ordinary arrays.

Try it out! Feel free to send me feedback directly, or to es-discuss.

My JS meetup talk

On February 9th I gave a talk at the Mountain View JavaScript meetup, hosted by Google, on some of the cool features we’re working on for the next version of ECMAScript. Max Walker from Marakana, Inc. did a beautiful job recording and editing the video.

Caveat: there’s one point in the video where I claim that in-browser modules loaded from the same URL shouldn’t share the same instance (aka “singletons”). Kevin Dangoor has recently prodded me to reconsider this, and I think he’s right: I believe we can share instances more like CommonJS, even in the browser. I’m still working on some of the details but I think it’s gonna be awesome.

A test for proper tail calls

Here are two versions of the same little function — or, I should say, almost the same little function:

function nonTail(n) {
    if (n > 0)
        nonTail(n - 1);
    return;
}

function tail(n) {
    if (n > 0)
        return tail(n - 1);
}

Neither function does anything particularly interesting, but it’s the way they do what they do that’s interesting. They’re both recursive, but only the second one is tail-recursive: it calls itself as the final result of a return statement. So even for very large values of n, in an implementation with proper tail calls, the tail-recursive function shouldn’t die with a stack overflow.

Here’s a test case to determine whether an engine supports proper tail calls. It should be fairly robust, even in the face of engines with different stack implementations and different VM limits: the first part of the test tries to determine experimentally what is a likely maximum stack depth for this function. (It’s doing a binary search, so it should be bounded above by about 31 × k stack frames, where k is the engine’s maximum stack depth for nonTail.) The second part then tries running the tail-recursive function with that depth and checks whether the engine throws an exception.

var lo = 0;
var hi = Math.pow(2, 31) - 1;

while (hi !== lo) {
    var mid = lo + (((hi - lo) / 2) | 0);
    try {
        nonTail(mid);
        lo = mid + 1;
    } catch (e) {
        hi = mid;
    }
}

var min = lo;

try {
    tail(min);
    tail(min * 2);
    print("test passed!");
} catch (e) {
    print("test failed!");
}

As we work on test suites for ECMAScript conformance, this kind of test should be useful for checking whether engines support proper tail calls.

PS I should point out that no current browser JS engines support proper tail calls, so right now this test will fail in all of them.

Proper tail calls in Harmony

In the last ECMAScript committee meeting, we promoted the proper tail calls proposal from “strawman” to official “proposal” status. There’s still work to do to make it into a polished spec, and it ain’t over till it ships, but I’m really excited that we could be making proper tail calls a guaranteed feature of JavaScript.

In short, proper tail calls make it possible for a function to delegate its result to a function call without allocating unnecessary space in the runtime system. In most languages, function calls always require allocating a stack frame, and stack frames are only deallocated on return. But this implementation approach isn’t necessary and even makes it impossible for programmers to write certain kinds of programs. Language implementations that guarantee proper tail calls promise that, whenever a function delegates to a function call to provide its final result (according to a fixed set of rules about what such delegation looks like syntactically), the runtime system will not indefinitely keep the caller’s frame allocated while the callee runs.

Why does this matter? For one, it makes it possible to break up large while or for loops into separate functions that tail-call one another, passing their state around as arguments instead of updating a shared piece of state. This makes it easier to understand the state and to build mock states in order to test the pieces of the algorithm independently. (I wonder if it might even open up opportunities to parallelize pieces of an algorithm into separate web workers. I’ve got another post coming on how some of Brendan’s recent ideas for Harmony might help with web workers.) Generally speaking, tail calls make it possible to modularize program control. Whether it’s just for the occasional one-off delegation (which proper tail calls makes as space-efficient as a goto), or for writing in full-on continuation-passing style (which, without proper tail calls, are more or less destined to blow the stack), this offers more flexibility for programmers to write more modular code.

The other people who stand to gain are compiler-writers. After all, compiling to JavaScript is getting more popular all the time. Tail calls make it possible to simulate lots of different kinds of control constructs in other programming languages. People have tried using setTimeout with a delay of 0, but browsers throttle these events with a minimum timeout of around 5 to 10 milliseconds (apparently, to support web pages that rely on the delay for animations). Having an officially guaranteed tail call mechanism makes it possible to compile control constructs like continuations, coroutines, threads, and actors. And, of course, it’s useful for compiling source languages with tail calls!

A word about stack traces: some people worry that tail calls interfere with getting useful stack traces. And it’s true that it can be useful to track the complete call history for stack traces, including tail calls. Now, when necessary, there are simple workarounds programmers can use for turning tail calls into non-tail calls. But a better answer is for VM writers to provide smarter stack traces which track only a finite number of the most recent tail calls in their stack traces. This satisfies the requirement of bounding the amount of allocation (since it’s a bounded amount of history) but still keeps useful diagnostic information in the stack trace. In the end, it’s true that this is a trade-off. In my experience, this cost is worth the benefit of being able to write better delegation and control abstractions.

Zaphod: A Browser Language Lab for JS

When I started full-time at Mozilla this January, one of the first projects I dreamed of starting was an experimental version of JavaScript that would be easy to hack and experiment with. Ideally, it would be embeddable in Firefox so that we could really try it out in the wild. My hope was that this would be useful for experimenting with JS features and ECMAScript committee proposals, and for seeing how they interact on the web. But, the best laid plans of mice and men being what they are, I never seemed to find the time for the project.

Enter our summer intern Tom Austin! Tom spent the first part of his summer helping us revive the Narcissus project, which is a JavaScript interpreter written in JavaScript. Brendan Eich wrote Narcissus several years ago as an experiment, but at that point it required a special build of SpiderMonkey to enable some low-level hooks. This meant that you couldn’t use Narcissus without building it from scratch. Tom worked on replacing the dependency on these hooks with the new meta-programming API’s coming in Firefox 4 so that Narcissus can be used with the standard build of SpiderMonkey.

Once Narcissus started getting back into shape, Tom spent the rest of his summer building Zaphod, which is now available as an experimental Mozilla Labs addon for Firefox. Zaphod lets you choose Narcissus as the scripting engine for a <script> element by specifying the attribute type="application/narcissus". It also lets you switch the default engine from SpiderMonkey to Narcissus so that you can run existing web pages through Narcissus.

Zaphod promises to be a great tool for getting hands-on experience with experimental JS language features. Initially, it will useful to us internally for rapidly piloting language features in the browser. Down the road, it may also become a vehicle for giving the community early access to these ideas, so they can try them out and give us feedback. And of course, it’s all open source, so anyone who wants to join in on the effort is most welcome.

An API for parsing JavaScript

In new builds of the SpiderMonkey shell we’re introducing an experimental API for parsing JavaScript source code, which landed this week. For now, you have to download and build SpiderMonkey from source to use it, but hopefully we’ll include it in future versions of Firefox.

The parser API provides a single function:

Reflect.parse(src[, filename=null[, lineno=1]])

Reflect.parse takes a source string (and optionally, a filename and starting line number for source location metadata), and produces a JavaScript object representing the abstract syntax tree of the parsed source code, using the built-in parser of SpiderMonkey itself. Straightforward enough, but behind this simple entry point is a thorough API that covers the entirety of SpiderMonkey’s abstract syntax. In short, anything that SpiderMonkey can parse, you can parse, too. Developer tools generally need a parse tree, and JavaScript isn’t an easy language to parse. With this API, it becomes much easier to write tools like syntax highlighters, static analyses, style checkers, etc. And because Reflect.parse uses the same parser that SpiderMonkey uses, it’s guaranteed to be compatible.

Here’s a simple example:

js> var ast = Reflect.parse("obj.foo + 42");
js> var expr = ast.body[0].expression;
js> expr.left.property
({loc:null, type:"Identifier", name:"foo"})
js> expr.right
({
    loc: {
        source: null,
        start: {
            line: 1,
            column: 10
        },
        end: {
            line: 1,
            column: 12
        }
    },
    type: "Literal",
    value: 42
})

Try it out, and feel free to give me feedback!

Module scoping and linking

The ECMAScript TC39 committee had its regular meeting this week. One of the things we talked about were some revisions that Sam Tobin-Hochstadt and I have been discussing for the Harmony module system.

In previous iterations, the design allowed nested modules to refer outside their bodies only to other modules. For example:

    module A {
        var foo = 42;
        module B { ... }
    }

Module B would be allowed to refer to A, but not foo. This would also be true if module B could be loaded from an external file:

    module A {
        var foo = 42;
        module B = load "b.js";
    }

On es-discuss, Jasvir Nagra and Ihab Awad raised the concern that this design renders externally-loaded modules too sensitive to external scopes, similar to textual inclusion (aka #include). To be fair, they would only be sensitive to modules in scope, since other bindings would be removed from scope. But the point still stands: non-local changes to the lexical environment in external files would affect externally loaded modules. It’s not too hard to come up with plausible scenarios where this could bite programmers.

The revision we discussed proposes separating the cases of externally loaded modules and lexically nested modules. For one thing, in the first example above, why shouldn’t B be able to refer to foo? It’s right there! That’s just good old lexical scope. But if B is loaded externally, it should not be sensitive to a lexical environment of unbounded size. But at the same time, we still need to provide the ability to link together modules, even with cyclic dependencies, and place them in separate files. For example, an MVC app might be divided up into a few sub-modules:

    module MyApp {
        ...
        module Model = load "model.js";
        module View = load "view.js";
        module Controller = load "controller.js";
        ...
    }

The individual sub-modules ought to be able to import from one another, even cyclically — for example, the view and the controller might be mutually dependent. In essence, MyApp is creating a local module graph, consisting of its immediately nested module declarations.

So instead of inheriting the entire lexical environment of the code loading them, the modules Model, View, and Controller should be given a lexical environment with just the modules bound in the local module graph. That way, within every component of a program, you can easily divide it up into separate pieces, each of which may depend on other pieces. But at the same time, because they only see the other modules nested within MyApp, the sub-modules aren’t sensitive to code changes in the rest of the program.

JavaScript needs modules!, ctd

I generally try to make my slides visual aids instead of a crib sheet, which means they aren’t typically self-contained. So I should provide a little context for my earlier post on the Harmony module system.

First things first: this design has been a collaboration with Sam Tobin-Hochstadt, who’s been utterly indispensable, as well as discussions with the TC39 committee and the es-discuss mailing list.

Second things second: we are not in competition with CommonJS, nor do we intend to replace it! CommonJS has done great work building a framework for creating reusable components in JavaScript. Now, they didn’t have the luxury of changing the language. We do, and we have a responsibility to make use of it to make JavaScript — and the entire open web — a better development environment.

One of the best practices on the web is Doug Crockford’s “module pattern,” which is a coding pattern for writing unobtrusive components that avoid stepping on the toes of other components. Design patterns are a recurring theme on the web: innovative users find workarounds to the limitations of web technologies, and as these patterns of use become popular, browser vendors and standards organizations can and should provide built-in support to ultimately alleviate the need for the workarounds altogether.

So it should be for modules! Instead of having to go through the pain of creating and immediately calling functions in order to get better scoping semantics, and to save and restore global bindings to be unobtrusive, JavaScript should have better scoping and modularity from the outset.

The modules in this design are second-class, that is, they are a static declaration form rather than a general expression form. This means that the imports and exports of a module are known statically to the compiler, which makes them more amenable to supporting lexical scope.

The general forms for using modules are simple and intuitive: a module declaration binds a module and the export form determines what definitions within it are shared with the outside world. You can access the exports of another module simple by selecting them with the usual dot-syntax (such as UI.Widget), or you can bind local variables to the exports of another module with the import syntax.

Modules can have cyclic dependencies, as long as the modules are all in scope with one another — much like function declarations.

These basic, simple features are part of the theme of usability and minimal intrusion into the development process: putting code in a module should be as convenient a refactoring step as possible.

The next theme of the design is moving JavaScript towards true lexical scope. Previously, JavaScript has always had a global object at the top of the “scope chain” (i.e., the lexical environment), which means that undeclared variables are looked up dynamically in a shared object. This means that the language’s default behavior is to create global variables. This is easy to trip over, and makes it inconvenient to write modular programs.

The TC39 committee has already decided that the next version of JavaScript will make some backwards-incompatible semantic changes to the language — within reason — and only be enabled by opting in to a new language version. We’ve also agreed that ES5′s strict mode will be a baseline, i.e., the default semantics, for the new version. This eliminates two of the pieces of JavaScript that currently break lexical scope: with-statements and eval modifying its caller’s scope. The last remaining offender is the global object. With this new language version, we can finally eliminate that, too. The benefit is that JavaScript becomes lexically scoped at last — no more runtime errors due to accidentally-omitted var declarations or misspelled variable names!

Module declarations are scoped just like other declarations. They can be declared inline, they can be simple renamings of existing modules, or they can be loaded from external files. This latter form has much the same semantics as a script tag, except that it can be done from within JavaScript.

Notice that the client gets to name the module it loads, instead of the module naming itself. This is critical: it means that modules don’t have to try to contend for a global namespace and use painful naming techniques for defensively avoiding stepping on each other’s toes. They can also contain nested modules and happily name them whatever they like, without contention.

The last piece of the design is dynamic loading. Just because modules are second-class does not mean they cannot be dynamically represented as values. In fact, module instances are easily reflected as objects simply by referring to them in expressions. Module instance objects are essentially immutable “views” of module instances; you cannot modify the exports of a module instance, but you can dynamically query their properties just like any object.

Module loaders give you the ability to dynamically load modules, and even to create multiple isolated contexts in which to load them. You can even provide low-level hooks to create custom loading semantics for a given module loader. Dynamic loading makes it possible to achieve important web use cases like dynamic feature detection, conditional loading, or on-demand loading. At the more intricate end of the spectrum, you can do live upgrades, implement isolated and stratified execution environments like online IDE’s, and write security-critical code that runs untrusted code in restricted contexts.

In all, the point of our module proposal is to make it incredible easy to create JavaScript components and share them, and to make JavaScript a truly lexically scoped language. In a larger sense, the goal is to continue making it easier to share code, because when we lower the barriers to entry, more people do more awesome things with the web all the time. And that’s why we’re here.

JavaScript needs modules!

I’ve just given my breakout session, “JavaScript needs modules!“, at the Mozilla Summit. (Note: none of this is implemented yet, so please don’t take any of the code snippets too literally.)

The enthusiastic atmosphere at the summit has been palpable– it’s hard not to be in awe of my coworkers and fired up for what’s ahead for the web. For my part, I believe that modules are one of the most important upcoming features of JavaScript, and I also believe we can make them happen.