Unbundling MediaStreamTrackProcessor and VideoTrackGenerator
Earlier this year we promised more blog posts about new and exciting WebRTC functionality in standards. So today we’re talking about MediaStreamTrackProcessor
and VideoTrackGenerator
! These two APIs let web developers convert a real time video MediaStreamTrack
to and from a stream of VideoFrame
s respectively. This simplifies real time video processing in JavaScript.
These APIs are video-only and worker-only which avoids jank when scrolling by never blocking the browser’s real time media pipeline on a busy main thread — They’re available in Safari 18 Tech Preview and will be in Firefox by the beginning of next year.
Chrome on the other hand, shipped proprietary versions of these APIs back in 2021. Those are instead exposed on the main-thread, and also handle audio. This is obviously not great for web interoperability. We’ll look at both versions, and propose solutions for web developers wondering how to manage this discrepancy.
Both versions promise convenience and performance. But are they equally necessary? And what about audio? Let’s explore.
The standard APIs
In the spec, MediaStreamTrackProcessor({track})
must be created in a worker, so you postMessage
your video track to the worker first.
In this simple example, the worker then pipes the processor’s readable VideoFrame
s through a transform and into a VideoTrackGenerator
, whose .track
member is postMessage
d back to main, and assigned to a video element. Once this has been set up, the media flows in real time through the transform off the main-thread.
If you’re in Safari 18 Tech Preview, click the “Result” tab to see two self-views, one regular and one sepia-toned and mirrored. Otherwise, don’t worry, you’ll see what it looks like in the subsequent example, which works in all browsers!
See the “JavaScript” tab for the code. The second argument to postMessage({track}, [track])
transfers the track to the worker. This is due to MediaStreamTrack
‘s custom cloning semantics, which might be simplified in the future.
The worker then hooks a JavaScript transform into the browser’s real time media pipeline like this:
await mstp.readable.pipeThrough(new TransformStream({transform})).pipeTo(vtg.writable);
The example’s transform
processes pixels to create a sepia-toned mirrored version of the video using OffscreenCanvas
(scroll down the code above to reveal it).
That’s it! The above APIs have consensus in the W3C, and Firefox plans to implement them.
The proprietary APIs
Chrome exposes an early version of MediaStreamTrackProcessor
and an earlier-named incompatible MediaStreamTrackGenerator
. Both are main-thread only. This unfortunately violates W3C design principle § 10.2.1. Some APIs should only be exposed to dedicated workers to avoid blocking rendering.
While web developers can still transfer the resulting readable
and writable
to a worker themselves, many don’t. Even when they do, transferring a property of a source is not the same as transferring that source. The source and sink remain on main-thread according to the Streams spec. Chrome tries to move the source and sink implicitly anyway, violating the streams spec. But this likely breaks down in general. For example it’s unclear what happens if you split the stream in two using tee()
and transfer one of them. Unpredictable browser optimizations can lead to action at a distance.
Another mismatch is that while the spec’s VideoTrackGenerator
has a track, MediaStreamTrackGenerator
is a track. And last, but not least, as the different name might suggest, it also does audio.
Firefox is unlikely to implement these proprietary APIs, for the reasons given above. No convincing main-thread use cases have been demonstrated. Also, audio data is already exposed through the more powerful AudioWorkletNode API.
The good news is these main-thread APIs are redundant: they’re easily polyfilled to work in all browsers today!
Polyfilling the proprietary APIs, part 1: video
This exercise will show how to do without the proprietary APIs, by achieving functional parity with them. Here’s an example where processing is done on main-thread for brevity:
This relies on the following polyfills in non-chromium browsers (under “Resources” if you “Edit in JSFiddle”):
The polyfills handle both audio and video, and the performance is decent. For video, they rely on OffscreenCanvas
and canvas.captureStream
(not off-screen yet) respectively. They can no doubt be optimized, e.g., by using workers to perform better in background tabs. Hopefully they’re helpful until Firefox and Chrome implement the spec.
Polyfilling the proprietary APIs, part 2: audio
Here’s an audio example using the same polyfills:
1.) At the time of writing, a limitation in Safari 18 Tech Preview prevents the audio shim from working there. We’ve reached out to the Safari team who said the AudioData
API isn’t quite ready yet. We will update here once this is fixed!
For audio, the polyfills use AudioWorkletNode
in both directions. The polyfills could likely be optimized, e.g. by using SharedArrayBuffer
to avoid copies, or workers to perform better in background tabs.
This performs quite well in Firefox, and is well below its audio callback budget. The amount of garbage collected might be improved with reusable SharedArrayBuffer
s given to AudioData
. But for most practical purposes it shouldn’t matter.
In any case, if performance is an issue, you’re likely better off using AudioWorkletNode
directly. You’ll have more tools at your disposal, and more direct control over things like threading.
Can the standard APIs be polyfilled?
We think the standard MediaStreamTrackProcessor
and VideoTrackGenerator
APIs add value. They satisfy W3C design principles, are more likely to perform reliably even under heavy load across various implementations, and don’t suffer action at a distance problems.
In the future, we might be able to polyfill them the same way using a new OffscreenCanvas.captureStream()
method — but it doesn’t exist today.
A core advantage of the standard APIs is they guide applications to move MediaStreamTrack
off main-thread. Firefox and Chrome still need to implement transfer of MediaStreamTrack
. And that part is not easily polyfilled.
We look forward to the standard APIs being implemented in all browsers! Until then, perhaps the polyfills can help bridge the gap. We hope this has given you some ideas! As always, if you have comments, feel free to reach out on X!