await syntax, it’s getting better all the time.
State of the Parallel Union
What I am talking about is unlocking the power lurking inside our devices: GPUs, SIMD instructions, and multiple processor cores. With the emerging WebGL 2.0 and SIMD standards, the Web is making significant progress on the first two. And Web Workers go some part of the way towards enabling multicore parallelism.
But workers are, by design, strongly isolated: they can only communicate via
On the other end of the spectrum, Mozilla Research and Intel Labs have done some experiments over the years with deterministic parallelism APIs (sometimes referred to as River Trail or PJS). The goal of these experiments was to find high-level abstractions that could enable parallel speedups without any of the pitfalls of threads. This is a difficult approach, because it’s hard to find high-level models that are general enough to suit a wide variety of parallel programs. And at least for the moment, PJS faces a difficult adoption challenge: JS engine implementors are reluctant to commit to a large implementation effort without more developer feedback, but developers can’t really put PJS through the paces without a good polyfill to try it out in real production apps.
An Extensible Web Approach to Parallel JS
In 2012, I co-signed the Extensible Web Manifesto, which urged browser vendors and standards bodies to prioritize basic, low-level, orthogonal primitives over high-level APIs. A key insight of the Extensible Web is that growing the platform incrementally actually enables faster progress because it allows Web developers to iterate quickly—faster than browser vendors and standards bodies can—on building better abstractions and APIs on top of the standardized primitives.
Turning back to parallelism, just such a low-level API has been in the air for a while. A couple years ago, Filip Pizlo and Ryosuke Niwa of Apple’s WebKit team discussed the possibility of a variation on
ArrayBuffer that could be shared between workers. Around the same time Thibault Imbert floated the same idea on his blog (perhaps inspired by similar functionality in Flash). At last year’s JSConf, Nick Bray of Google’s PNaCl team demo’ed a working prototype of shared buffers in Chrome.
Now, there’s no question such an API is low-level. Unlike PJS, a
SharedArrayBuffer type with built-ins for locking would introduce new forms of blocking to workers, as well as the possibility that some objects could be subject to data races. But unlike Nashorn, this is only true for objects that opt in to using shared memory as a backing store—if you create an object without using a shared buffer, you know for sure that it can never race. And workers do not automatically share memory; they have to coordinate up front to share an array buffer. As long as your top level worker code never accepts and uses a shared buffer, you are assured of the same amount of isolation between workers as ever.
Another sensible restriction, at least at this point, is to limit access to shared buffers to workers. Eventually, sharing buffers with the main thread, ideally in controlled ways, would be a logical extension. Exposing shared buffers to the main thread would increase power and allow us to connect parallel computations directly to Web APIs like
<canvas>. At the same time, the main thread has implementation challenges and could carry risks for the JS programming experience. It’s an important area to explore but it needs careful investigation.
And crucially, developers would be able to start building higher-level abstractions. As one example, I’ve sketched out API ideas for region-slicing, data-race-free sharing of portions of a single binary buffer, and this could easily be polyfilled with
Moreover, by providing high-performance primitives, different domain-specific abstractions can determine for themselves how to enforce their guarantees. Consider region-slicing, for example: the design represents regions as objects and shares them with workers via message-passing. For some cases, the hits of creating wrapper objects and passing messages would be negligible; others—say, a column-major multidimensional array—might require allocating and communicating so many region slices as to dominate any parallelism gains. Providing the low-level primitives empowers library authors to determine for themselves how to achieve their desired guarantees and what use cases to enable.
We’ve begun experimenting with a
SharedArrayBuffer API in SpiderMonkey. Lars Hansen is drafting a spec of the API we’re experimenting with, and we’ve provided a prototype implementation in Firefox Nightly builds. Our hope is that this will allow people to play with the API and give us feedback.
While there seems to be a good amount of interest in this direction, it will require more discussion with Web developers and browser implementers alike. With this post we’re hoping to encourage a wider conversation. We’ll be reaching out to solicit more discussion in standards forums, and we’d love to hear from anyone who’s interested in this space.