WebGL in Web Workers, Today – and Faster than Expected!

Alon Zakai

8

Web Workers are additional threads that a website can create. Using workers, a website can utilize multiple CPU cores to speed itself up, or move heavy single-core processing to a background thread to keep the main (UI) thread as responsive as possible.

A problem, however, is that many APIs exist only on the main thread, for example WebGL. WebGL is a natural candidate for running in a worker, as it is often used by things like 3D games, simulations, etc., which do heavy amounts of JavaScript that can stall the main thread. Therefore there have been discussions about supporting WebGL in workers (for example, work is ongoing in Firefox), and hopefully this will be widely supported eventually. But that will take some time – perhaps we can polyfill this meanwhile? That is what this blogpost is about: WebGLWorker is a new open source project that makes the WebGL API available in workers. It does so by transparently proxying necessary commands to the main thread, where WebGL is then rendered.

Quick overview of the rest of this post:

  • We’ll describe WebGLWorker’s design, and how it allows running WebGL-using code in workers, without any modifications to that code.
  • While the proxying approach has some inherent limitations which prevent us from implementing 100% of the WebGL API, the part that we can implement turns out to be sufficient for several real-world projects.
  • We’ll see performance numbers on the proxying approach used in WebGLWorker, showing that it is quite efficient, which is perhaps surprising since it sounds like it might be slow.

Examples

Before we get into technical details, here are two demos that show the project in action:

  • PlayCanvas: PlayCanvas is an open source 3D game engine written in JavaScript. Here is the normal version of one of their examples, and here is the worker version.
  • BananaBread: BananaBread is a port of the open source Cube 2/Sauerbraten first person shooter using Emscripten. Here is one of the levels running normally, and here it is in a worker. (Note that startup on the worker version may be sluggish; see the notes on “nested workers”, below, for why.)

In both cases the rendered output should look identical whether running on the main thread or in a worker. Note that neither of these two demos were modified to run in a worker: The exact same code, in both cases, runs either on the main thread or in a worker (see the WebGLWorker repo for details and examples).

How it works

WebGLWorker has two parts: The worker and the client (= main thread). On the worker, we construct what looks like a normal WebGL context, so that a project using WebGL can use the WebGL API normally. When you call it, the WebGLWorker worker code queues rendering commands into buffers, then sends them over at the end of a frame using postMessage. The client code on the main thread receives such command buffers, and then executes them, command after command, in order. Here is what this architecture looks like:

WebGLWorker

Note that the WebGL-using codebase interacts with WebGLWorker’s worker code synchronously – it calls into it, and receives responses back immediately, for example, createShader returns an object representing a shader. To do that, the worker code needs to parse shader source files and so forth. Basically, we end up implementing some of the “frontend” of WebGL ourselves, in JS. And, of course, on the client side the WebGLWorker client code interacts synchronously with the actual browser WebGL context, executing commands and saving the responses where relevant. However, the important thing to note is that the WebGLWorker worker code sends messages to the client code, using postMessage, but it cannot receive responses – the response would be asynchronous, but WebGL application code is written synchronously. So that arrow goes only in one direction.

For that reason, a major limitation of this approach is synchronous operations that we cannot implement ourselves, like readPixels and getError – in both of those,  we don’t know the answer ourselves, we need to get a response from the actual WebGL context on the client. But we can’t access it synchronously from a worker. As a consequence we do not support readPixels, and for getError, we return “no error” optimistically in the worker where getError is called, proxy the call, and if the client gets an error message back, we abort since it’s too late to tell the worker at that point. We are therefore limited in what we can accomplish with this approach. However, as the demos above show, 2 real-world projects work out of the box without issues.

Performance

I think most people’s intuition would be that this approach has to slow things down. After all, we are doing more work – queue commands in a buffer, serialize it, transfer it, deserialize it, and then execute the commands. All of that instead of just running each command as it is invoked! Now, on a single-core machine that would be correct, but on a multi-core machine there are reasons to suspect otherwise, because the worker thread often does other heavy JavaScript operations, so freeing it up quickly to do whatever else it needs can be beneficial. There are a few reasons why that might be possible:

  • Some commands are proxied literally by just appending a few numbers to an array. That can be faster than calling into a DOM API which crosses into native C++ code.
  • Some commands are handled locally in JavaScript, without proxying at all. Again, this can be faster than crossing the native code boundary.
  • Calling from the browser’s native code into the graphics driver can cause delays, which proxying avoids.

So on the one hand it seems obvious that this must be slower, but there are also some reasons to think it might not be. Let’s see some measurements!

glproxy2

The chart shows frames per second (higher numbers are better) on the BananaBread demo, on Firefox and Chrome (latest developer versions, Firefox 33 and Chrome 37, on Linux; results on other OSes are overall similar), on a normal version running on the main thread, and a worker version using WebGLWorker. Frame rates are shown for different numbers of bots, with 2 being a “typical” workload for this game engine, 0 being a light workload and 10 being unrealistically large (note that “0 bots” does not mean “no work” – even with no bots, the game engine renders the world, the player’s model and weapon, HUD information, etc.). As expected, as we go from 0 bots to 2 and then 10, frame rates decrease a little, because the game does more work, both on the CPU (AI, physics, etc.) and on the GPU (render more characters, weapon effects, etc.). Hower, on the normal versions (not running in a worker), the decrease is fairly small, just a few frames per second under the optimal 60. On Firefox, we see similar results when running in a worker as well, even with 10 bots the worker version is about as fast as the normal version. On Chrome, we do see a slowdown on the worker version, of just a few frames per second for reasonable workloads, but a larger one for 10 bots.

Overall, then, WebGLWorker and the proxying approach do fairly well: While our intuition might be that this must be slow, in practice on reasonable workloads the results are reasonably fast, comparable to running on the main thread. And it can perform well even on large workloads, as can be seen by 10 bots on the worker version on Firefox (Chrome’s slowdown there appears to be due to the proxying overhead being more expensive for it – it shows up high on profiles).

Note, by the way, that WebGLWorker could probably be optimized a lot more – it doesn’t take advantage of typed array transfer yet (which could avoid much of the copying, but would make the protocol a little more complex), nor does it try to consolidate typed arrays in any way (hundreds of separate ones can be sent per frame), and it uses a normal JS array for the commands themselves.

A final note on performance: All the measurements from before are for throughput, not latency. Running in a worker inherently adds some amount of latency, as we send user input to the worker and receive rendering back, so a frame or so of lag might occur. It’s encouraging that BananaBread, a first person shooter, feels responsive even with the extra latency – that type of game is typically very sensitive to lag.

“Real” WebGL in workers – that is, directly implemented at the browser level – is obviously still very important, even with the proxying polyfill. Aside from reducing latency, as just mentioned, real WebGL in workers also does not rely on the main thread to be free to execute GL commands, which the proxying approach does.

Other APIs

Proxying just WebGL isn’t enough for a typical WebGL-using application. There are some simple things like proxying keyboard and mouse events, but we also run into more serious issues, like the lack of HTML Image elements in workers. Both PlayCanvas and BananaBread use Image elements to load image assets and convert them to WebGL textures. WebGLWorker therefore includes code to proxy Image elements as well, using a similar approach: When you create an Image and set its src URL, we proxy that info and create an Image on the main thread. When we get a response, we fire the onload event, and so forth, after creating JS objects that look like what we have on the main thread.

Another missing API turns out to be Workers themselves! While the spec supports workers created in workers (“nested workers” or “subworkers”), and while an html5rocks article from 2010 mentions subworkers as being supported, they seem to only work in Firefox and Internet Explorer so far (Chrome bug, Safari bug). BananaBread uses workers for 2 things during startup: to decompress a gzipped asset file, and to decompress crunched textures, which greatly improves startup speed. To get the BananaBread demo to work in browsers without nested workers, it includes a partial polyfill for Workers in Workers (which is far from complete – just enough to run the 2 workers actually needed). (This incidentally is the reason why Chrome startup on the BananaBread worker demo is slower than on the non-worker demo – the polyfill uses just one core, instead of 3, and it has no choice but to use eval, which is generally not optimized very well.)

Finally, there are a variety of missing APIs like requestAnimationFrame, that are straightforward to fill in directly without proxying (see proxyClient.js/proxyWorker.js in the WebGLWorker repo). These are typically not hard to implement, but the sheer amount of potential APIs a website might use (and are not in workers) means that hitting one of them is the most likely thing to be a problem when porting an app to run in a worker.

Summary

Based on what we’ve seen, it looks surprisingly practical to polyfill APIs via proxying in order to make them available in workers. And WebGL is one of the larger and higher-traffic APIs, so it is reasonable to expect that applying this approach to something like WebSockets or IndexedDB, for example, would be much more straightforward. (In fact, perhaps the web platform community could proxy an API first and use that to prioritize speccing and implementing it in workers?) Overall, it looks like proxying could enable more applications to run code in web workers, thus doing less on the main thread and maximizing responsiveness.

8 responses

Post a comment

  1. Blender fan wrote on ::

    Interesting, thanks!

    FYI in Blend4Web the Bullet/Asm.js physics engine (called uranium.js) is running in the separate Worker too.

    Reply

  2. zproxy wrote on ::

    Cool. This will allow headless, off screen rendering.

    Reply

    1. Alon Zakai wrote on :

      This would still require a functioning WebGL context somewhere else, in order to render graphics. You *can* run your app in a headless environment, but it’s output is a stream of WebGL commands, not pixels. If that’s what you want, though, then it would work right now.

      One interesting use case is you could run a WebGL application in node.js on a server, and stream the render commands to the client’s browser which renders them. That is, run on a distant machine instead of in a worker (in both cases the interface is asynchronous so it should work ok).

      Reply

  3. Jon W wrote on :

    Why does the game slow down in fps at all when adding a few bots? Unless you’re at 100% CPU, which seems unlikely, shouldn’t it be locked to vsync? And if not locked, shouldn’t it run faster than 60 Hz?

    Reply

    1. Alon Zakai wrote on :

      Browsers limit the refresh rate to 60fps. The numbers shown there are averages over time, what happens in practice is that even with 2, it is mostly at 60fps, but now and then the 2 bots cause enough explosions and other costly effects that it drops to significantly below 60fps, briefly. And this happens more often with 10 bots.

      Reply

  4. Patrick Pfeiffer wrote on :

    for your information, this is the corresponding chromium-bug: https://code.google.com/p/chromium/issues/detail?id=245884

    they argue that one can offload non-rendering tasks to workers instead of offloading rendering tasks to workers.

    i am in the middle of writing a complex 3D-multiplayer game for the browser and currently i am trying to offload non-rendering tasks. so the game-logic runs in the worker and gets rendered in the main-thread. for what i can say until now: “it works”. though with the webgl-worker i would save some perf theoretically i guess, since i dont have to “proxy” the network from the main-thread into the game-logic-thread (since websockets dont work in workers in firefox, yet).

    Reply

    1. Alon Zakai wrote on :

      Thanks for the link. I agree with their position that use cases are needed before doing serious work, but the demos here are valid use cases in my opinions – entire WebGL-using applications/engines that we want to run in workers. Splitting out the non-rendering parts is hard. I’ll comment there too.

      Reply

  5. Markus Henschel wrote on :

    This sounds really exciting. I think it could help us with one of our current problems. Our loading routine needs to call GL commands and takes some time. This blocked the browser for several seconds. The native app would display a progress bar but this didn’t work with WebGL as there is no swap buffers command. I guess this should work now with a WebGL proxy.

    Reply

Post Your Comment