{"id":764,"date":"2015-02-26T20:11:38","date_gmt":"2015-02-26T20:11:38","guid":{"rendered":"http:\/\/blog.mozilla.org\/javascript\/?p=764"},"modified":"2015-02-26T20:11:38","modified_gmt":"2015-02-26T20:11:38","slug":"the-path-to-parallel-javascript","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/javascript\/2015\/02\/26\/the-path-to-parallel-javascript\/","title":{"rendered":"The Path to Parallel JavaScript"},"content":{"rendered":"<p>Between the <a href=\"http:\/\/wiki.ecmascript.org\/doku.php?id=harmony:specification_drafts\">coming release of ES6<\/a> and <a href=\"http:\/\/arewefastyet.com\/#machine=11&amp;view=single&amp;suite=octane&amp;subtest=Gameboy\">unrelenting<\/a> <a href=\"http:\/\/arewefastyet.com\/#machine=11&amp;view=single&amp;suite=octane&amp;subtest=RayTrace\">competition<\/a> <a href=\"http:\/\/arewefastyet.com\/#machine=11&amp;view=single&amp;suite=octane&amp;subtest=Splay\">for<\/a> <a href=\"http:\/\/arewefastyet.com\/#machine=11&amp;view=single&amp;suite=octane&amp;subtest=Typescript\">JIT<\/a> <a href=\"http:\/\/arewefastyet.com\/#machine=11&amp;view=single&amp;suite=octane&amp;subtest=zlib\">performance<\/a>, these are exciting times for JavaScript. But an area where JS still lags is <em>parallelism<\/em>\u2014exploiting hardware acceleration by running multiple computations simultaneously. I\u2019d like to present some experiments we\u2019ve been doing in SpiderMonkey with a <strong>low-level, evolutionary approach<\/strong> to extending JavaScript with <strong>more flexible and powerful primitives for parallelism<\/strong>.<\/p>\n<p>I should be clear that I\u2019m not talking about <em>concurrency<\/em>, which is about writing programs that respond to simultaneous events. JavaScript\u2019s asynchronous concurrency model is popular and successful, and with <a href=\"https:\/\/promisesaplus.com\/\">promises<\/a>, <a href=\"https:\/\/facebook.github.io\/regenerator\/\">ES6 generators<\/a>, and the upcoming <a href=\"https:\/\/github.com\/lukehoban\/ecmascript-asyncawait\"><code>async<\/code>\/<code>await<\/code><\/a> syntax, it\u2019s getting better all the time.<\/p>\n<h3>State of the Parallel Union<\/h3>\n<p>What I <em>am<\/em> talking about is unlocking the power lurking inside our devices: GPUs, SIMD instructions, and multiple processor cores. With the emerging <a href=\"https:\/\/www.khronos.org\/registry\/webgl\/specs\/latest\/2.0\/\">WebGL 2.0<\/a> and <a href=\"https:\/\/hacks.mozilla.org\/2014\/10\/introducing-simd-js\/\">SIMD<\/a> standards, the Web is making significant progress on the first two. And <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/API\/Web_Workers_API\/basic_usage\">Web Workers<\/a> go some part of the way towards enabling multicore parallelism.<\/p>\n<p>But workers are, by design, strongly isolated: they can only communicate via <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/API\/Worker\/postMessage\"><code>postMessage<\/code><\/a>. And for good reason! JavaScript\u2019s \u201crun-to-completion\u201d programming model is a central part of the programming experience: when your code runs in an event handler, the functions and methods that you call are the only code you have to worry about changing your app state. Nevertheless, this comes at a cost: when multiple threads want to coordinate, they repeatedly have to copy any data they need to communicate between each other. The ability to <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/API\/Web_Workers_API\/Advanced_concepts_and_examples#Passing_data_by_transferring_ownership_%28transferable_objects%29\">transfer binary buffers<\/a> helps cut down on some of these copying costs, but for many apps this still just can\u2019t compete with the ability for multiple threads to write simultaneously into different parts of shared state. Even setting aside the costs of data transfer, message-passing itself has nontrivial latency. It\u2019s hard to compete with dedicated hardware instructions that allow threads to communicate directly through shared state.<\/p>\n<p>So where should we go from here? A radical option would be to bite the bullet and do what <a href=\"http:\/\/openjdk.java.net\/projects\/nashorn\/\">Nashorn<\/a> has done: turn JavaScript into a fully multi-threaded data model and call it a day. In Nashorn, <a href=\"https:\/\/blogs.oracle.com\/nashorn\/entry\/nashorn_multi_threading_and_mt\">nothing stops you from running multiple Java threads on a shared JavaScript environment<\/a>. Unless your host Java program is careful to synchronize your scripts, your JavaScript apps lose all the guarantees of run-to-completion. Frankly, I can\u2019t imagine considering such a step right now. Even setting aside the massive standardization and implementation work required, it\u2019s a huge ecosystem risk: every app, every library, every data structure ever written to date threatens to be subverted by imperfect (or malicious) uses of threads.<\/p>\n<p>On the other end of the spectrum, Mozilla Research and Intel Labs have done some experiments over the years with <a href=\"http:\/\/wiki.ecmascript.org\/doku.php?id=strawman:data_parallelism\">deterministic parallelism APIs<\/a> (sometimes referred to as <a href=\"https:\/\/github.com\/IntelLabs\/RiverTrail\">River Trail<\/a> or <a href=\"http:\/\/smallcultfollowing.com\/babysteps\/blog\/2014\/04\/24\/parallel-pipelines-for-js\/\">PJS<\/a>). The goal of these experiments was to find high-level abstractions that could enable parallel speedups without any of the pitfalls of threads. This is a difficult approach, because it\u2019s hard to find high-level models that are general enough to suit a wide variety of parallel programs. And at least for the moment, PJS faces a difficult adoption challenge: JS engine implementors are reluctant to commit to a large implementation effort without more developer feedback, but developers can\u2019t really put PJS through the paces without a good <a href=\"https:\/\/remysharp.com\/2010\/10\/08\/what-is-a-polyfill\">polyfill<\/a> to try it out in real production apps.<\/p>\n<h3>An Extensible Web Approach to Parallel JS<\/h3>\n<p>In 2012, I co-signed the <a href=\"https:\/\/extensiblewebmanifesto.org\/\">Extensible Web Manifesto<\/a>, which urged browser vendors and standards bodies to prioritize basic, low-level, orthogonal primitives over high-level APIs. A key insight of the Extensible Web is that growing the platform incrementally actually enables faster progress because it allows Web developers to iterate quickly\u2014faster than browser vendors and standards bodies can\u2014on building better abstractions and APIs on top of the standardized primitives.<\/p>\n<p>Turning back to parallelism, just such a low-level API has been in the air for a while. A couple years ago, Filip Pizlo and Ryosuke Niwa of Apple\u2019s WebKit team discussed the possibility of a variation on <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Global_Objects\/ArrayBuffer\"><code>ArrayBuffer<\/code><\/a> that could be <a href=\"https:\/\/lists.webkit.org\/pipermail\/webkit-dev\/2013-April\/024682.html\">shared between workers<\/a>. Around the same time Thibault Imbert <a href=\"http:\/\/typedarray.org\/concurrency-in-javascript\/\">floated the same idea on his blog<\/a> (perhaps inspired by <a href=\"http:\/\/help.adobe.com\/en_US\/as3\/dev\/WS2f73111e7a180bd0-5856a8af1390d64d08c-7ffe.html#WS2f73111e7a180bd0-45a72d15139c696689c-8000\">similar functionality in Flash<\/a>). At last year\u2019s <a href=\"http:\/\/2014.jsconf.us\/speakers.html#bray\">JSConf<\/a>, Nick Bray of Google\u2019s PNaCl team demo\u2019ed a <a href=\"https:\/\/www.youtube.com\/watch?v=-xNZYr40QOk&amp;t=14m18s\">working prototype of shared buffers in Chrome<\/a>.<\/p>\n<p>Now, there\u2019s no question such an API is low-level. Unlike PJS, a <code>SharedArrayBuffer<\/code> type with built-ins for locking would introduce new forms of blocking to workers, as well as the possibility that some objects could be subject to <a href=\"http:\/\/blog.regehr.org\/archives\/490\">data races<\/a>. But unlike Nashorn, this is only true for objects that opt in to using shared memory as a backing store\u2014if you create an object without using a shared buffer, you know for sure that it can never race. And workers do not automatically share memory; they have to coordinate up front to share an array buffer. As long as your top level worker code never accepts and uses a shared buffer, you are assured of the same amount of isolation between workers as ever.<\/p>\n<p>Another sensible restriction, at least at this point, is to limit access to shared buffers to workers. Eventually, sharing buffers with the main thread, ideally in controlled ways, would be a logical extension. Exposing shared buffers to the main thread would increase power and allow us to connect parallel computations directly to Web APIs like <code>&lt;canvas&gt;<\/code>. At the same time, the main thread has implementation challenges and could carry risks for the JS programming experience. It\u2019s an important area to explore but it needs careful investigation.<\/p>\n<p>So this approach is more conservative than full threading, and yet it should be more than enough to satisfy a large number of use cases\u2014from number-crunching to graphics processing to video decoding\u2014and with a much smaller implementation cost on engines than more ambitious solutions like PJS or threads. This would significantly move the needle on what JavaScript applications can do with workers, as well as open new opportunities for compiling threaded languages to the Web.<\/p>\n<p>And crucially, developers would be able to start building higher-level abstractions. As one example, I\u2019ve sketched out API ideas for <a href=\"https:\/\/gist.github.com\/dherman\/5463054\"><em>region-slicing<\/em><\/a>, data-race-free sharing of portions of a single binary buffer, and this could easily be polyfilled with <code>SharedArrayBuffer<\/code>. Similarly, multi-dimensional parallel array traversals, similar to PJS, could be polyfilled in plain JavaScript, instead of being blocked on standardization. Each of these APIs has pros and cons, including different use cases and performance trade-offs. And the Extensible Web approach lets us experiment with and settle on these and other high-level abstractions <em>faster<\/em> than trying to standardize them directly.<\/p>\n<p>Moreover, by providing high-performance primitives, different domain-specific abstractions can determine for themselves how to enforce their guarantees. Consider region-slicing, for example: the design represents regions as objects and shares them with workers via message-passing. For some cases, the hits of creating wrapper objects and passing messages would be negligible; others\u2014say, a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Row-major_order\">column-major<\/a> multidimensional array\u2014might require allocating and communicating so many region slices as to dominate any parallelism gains. Providing the low-level primitives empowers library authors to determine for themselves how to achieve their desired guarantees and what use cases to enable.<\/p>\n<h3>Next Steps<\/h3>\n<p>We\u2019ve begun experimenting with a <code>SharedArrayBuffer<\/code> API in SpiderMonkey. Lars Hansen is <a href=\"https:\/\/docs.google.com\/document\/d\/1NDGA_gZJ7M7w1Bh8S0AoDyEqwDdRh4uSoTPSNn77PFk\">drafting a spec<\/a> of the API we\u2019re experimenting with, and we\u2019ve provided a prototype implementation in <a href=\"https:\/\/nightly.mozilla.org\/\">Firefox Nightly<\/a> builds. Our hope is that this will allow people to play with the API and give us feedback.<\/p>\n<p>While there seems to be a good amount of interest in this direction, it will require more discussion with Web developers and browser implementers alike. With this post we\u2019re hoping to encourage a wider conversation. We\u2019ll be reaching out to solicit more discussion in standards forums, and we\u2019d love to hear from anyone who\u2019s interested in this space.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Between the coming release of ES6 and unrelenting competition for JIT performance, these are exciting times for JavaScript. But an area where JS still lags is parallelism\u2014exploiting hardware acceleration by running multiple computations simultaneously. I\u2019d like to present some experiments &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/javascript\/2015\/02\/26\/the-path-to-parallel-javascript\/\">Continue reading<\/a><\/p>\n","protected":false},"author":187,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/posts\/764"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/users\/187"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/comments?post=764"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/posts\/764\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/media?parent=764"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/categories?post=764"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/javascript\/wp-json\/wp\/v2\/tags?post=764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}