← View all posts
April 7, 2019

Perfect negotiation in WebRTC

Contributed by Jan-Ivar Bruaroey,

New preface: What if you could add and remove media to and from a live WebRTC connection, without having to worry about state, glare (signaling collisions), role (what side you’re on), or what condition the connection is in? You’d simply call pc.addTrack(track, stream) regardless of time and place, and your track would just show up on the opposite side, without risk of terminal connection failure. A pipe dream? Too much to ask? Actually, now that Chrome has finally fixed its negotiationneeded event, this almost works! But you’re not using it, because “almost” doesn’t cut it. It only works 95% of the time, and the stakes from glare are too high in Chrome (no way to roll back from it, just pc.close()). But this original promise of the API is still within reach. We just need browsers other than Firefox to implement rollback, and fix some glaring (no pun intended) races in a few methods in the specification.

Perhaps few things in the WebRTC API cause as many pilot errors as the asymmetric exchange of SDP and ICE. Its stateful signalingState and timing sensitive ICE trickling can be a source of races if programmed incorrectly. As if that’s not challenging enough, keeping the two sides in sync and the two directions apart throws a lot of people for a loop (more puns).

What if I told you we can tame this complexity? That the manual way most people go about negotiating in WebRTC today is inferior, and possibly racy?

I’m talking about negotiationneeded and "rollback". Two mechanisms that can be used in concert to abstract negotiation away entirely. Yet, no-one is using them, because they haven’t worked, and don’t work yet in Chrome, respectively. We’ll introduce both.

negotiationneeded tames local changes

The negotiationneeded event is to the offer/answer exchange what the icecandidate event is to the ICE candidate exchange. You may already know how to use the latter. You do this to send:

pc.onicecandidate = ({candidate}) => io.send({candidate});

…and you do this to receive:

io.onmessage = async ({data: {description, candidate}}) => {
  if (candidate) await pc.addIceCandidate(candidate);
}

…and then the browsers take it from there. It’s plug and play easy! We don’t care how, when or why candidates need to be exchanged, we just hook up the web socket tube, and leave the browsers to it. We’ve effectively abstracted away the ICE exchange; the browsers take care of it.

So why are we torturing ourselves with manual offer/answer? Just set it up the same way. To send we do:

pc.onnegotiationneeded = async () => {
  await pc.setLocalDescription(await pc.createOffer());
  io.send({description: pc.localDescription});
}

…and to receive we do:

io.onmessage = async ({data: {description, candidate}}) => {
  if (description) {
    await pc.setRemoteDescription(description);
    if (description.type == "offer") {
      await pc.setLocalDescription(await pc.createAnswer());
      io.send({description: pc.localDescription});
    }
  } else if (candidate) await pc.addIceCandidate(candidate);
}

That’s it!

Now, whenever we shake media at the RTCPeerConnection object, it gets sent automatically:

const stream = await navigator.mediaDevices.getUserMedia({video: true,
                                                          audio: true});
for (const track of stream.getTracks()) {
  pc.addTrack(track, stream);
}

…to the other side where it shows up here:

pc.ontrack = ({streams}) => video.srcObject = streams[0];

Negotiation happens automatically as needed. We’ve abstracted away negotiation; the browsers take care of it.

This is an API I can get behind!

Perhaps surprisingly, this works on both ends! We’re only sending media one way so far, but the other side can send stuff back the same way, by calling pc.addTrack(track, stream) on its end. Negotiation is automatic there too. The offerer/answerer roles are simply reversed in that case.

Keep making changes to your peer connection object, any change you want, and it will renegotiate as needed on the next JavaScript tick. You’ll never have to worry about negotiation again!

Except when both sides make changes at the exact same time…

But… what about glare?

“Glare” is when both sides send an offer to each other at the same time, upsetting their state machines. Their wires get crossed. It’s like when you and a friend start talking at the same time, and you both go “Have you heard… No, sorry, you go ahead!” – except you’re computers without neural nets, so you keep on doing it and never recover.

Everything is ruined, back to negotiating by hand! – Well, hold on. Let’s see if we can fix it.

rollback tames remote changes

Glare is an application problem, because we can solve it a number of ways. For instance: we could avoid glare entirely if we used an out-of-band data channel to cause all changes to always come from one end only. But that’s a super-ugly API, and we were so close! So instead, we’re going to use "rollback" to save the day. We’re aiming for perfect negotiation.

The polite peer

In short, we’ll make one peer be polite, a pushover, the one to say “No, sorry, you go ahead!”—It will roll back its offer in the face of an incoming offer. We’re in luck, because this is what "rollback" does:

await pc.setLocalDescription({type: "rollback"});

It takes us back to the "stable" signalingState, regardless of what state we were in, so we can let the other peer go first.

Here’s how to implement the polite peer:

io.onmessage = async ({data: {description, candidate}}) => {
  if (description) {
    if (description.type == "offer" && pc.signalingState != "stable") {
      if (!polite) return;
      await Promise.all([
        pc.setLocalDescription({type: "rollback"}),
        pc.setRemoteDescription(description)
      ]);
    } else {
      await pc.setRemoteDescription(description);
    }
    if (description.type == "offer") {
      await pc.setLocalDescription(await pc.createAnswer());
      io.send({description: pc.localDescription});
    }
  } else if (candidate) await pc.addIceCandidate(candidate);
}

This is a little more code than before. To unpack it, let’s break out just the part that’s new, the part handling collision:

    if (description.type == "offer" && pc.signalingState != "stable") {
      if (!polite) return;
      await Promise.all([
        pc.setLocalDescription({type: "rollback"}),
        pc.setRemoteDescription(description);
      ]);
    } else {

We use a global variable, polite, which is true for one peer, and false for the other. The next line is the full implementation of the impolite peer, which sticks to its guns. It ignores the peer here:

      if (!polite) return;

The remaining code is the polite peer. It rolls back its local offer, and then sets the remote offer instead.

Patience wins, because a rollback merely postpones our changes until the next negotiation, which will automatically happen right after this one. We avoid the collision, which makes the other side happy. negotiationneeded will fire as soon as we’re back to "stable", so we can be confident everything will be right on our end as well in the end.

Why Promise.all?

If you’re wondering why we’re using Promise.all, it has to do with ICE timing. We must avoid racing with the signaling channel, since ICE candidates may be in the pipe right behind the offer that just came in. We’ll get called again if that happens, and it won’t wait for our promise-based async function here to have finished. So we must ensure the ICE agent will be ready to process these candidates. We do this by queuing our two method calls right away on the peer connection object, in the right order, ahead of any addIceCandidate calls that may happen. Yes, that’s right: RTCPeerConnection implements a queue internally, where only one of these asynchronous operations may run at the same time. This will put any addIceCandidate calls third in line, by which time things will be in the expected "have-remote-offer" state, and the ICE agent will be ready to process any candidates. Had we instead merely awaited each method, we’d leave a time window between the methods for addIceCandidate to come in prematurely.

One last race to fix

There’s a similar race on the other end. We must fix our negotiationneeded code from earlier, to this:

pc.onnegotiationneeded = async () => {
  const offer = await pc.createOffer();
  if (pc.signalingState != "stable") return;
  await pc.setLocalDescription(offer);
  io.send({description: pc.localDescription});
}

“But”, you say, negotiationneeded is only ever called from "stable" state, so what’s going on?” Well, createOffer is asynchronous and takes time, so a remote offer may race us and sometimes take us out of "stable" to "have-remote-offer", beating our call to setLocalDescription, causing it to fail. No rollback is needed in this case, so we simply return. negotiationneeded will happen again once we return to "stable", so nothing will be lost by bailing here.

But this is ugly!

Two points:

First point: Yes, that’s why I’m blogging about this now, somewhat prematurely. I’ve filed 3 spec issues that I hope will remove the need for the Promise.all as well as the "stable" test. These come down to having a slightly different API to performing rollback, an API for setting a new local description in a glare-safe way, and, somewhat related, a way to restart ICE that doesn’t interfere with negotiationneeded.

Second point: The point is to write this code once, to abstract away this state machine. Once this is done correctly—and it’s air tight—then the rest of the application won’t have to worry about this complexity anymore. You may cut’n’paste “the polite peer” above once, and from now on simply call methods on the peer connection without worrying about this state machine ever again.

If you’re wondering why WebRTC couldn’t take care of this automatically, then a big reason is that some people wanted a high-level API, and others wanted a low-level one. The working group couldn’t find a one-size-fits-all. Specifically, solving this in the current architecture without application involvement would likely have required WebRTC to define its own signaling channel semantics and foist it on everyone, and that would not have fit everyone’s needs.

The good news is that higher-level APIs can be written on top of this. Hopefully, this blog post helps illustrate one way to do that.

Why be perfect?

Show of hands: How many have joined a WebRTC call, have it not connect, then refreshed the page, and have it work? That’s buggy and unacceptable!

The ambition here is no less than to take negotiation off your hands for all modifications to your peer connections, regardless of when they happen, and have it work. From my early tests, this appears to work. I won’t lie, we’ve discovered some bugs in Firefox from this exploration, and we’re working on fixing them. It’s also early days in Chrome, which doesn’t support this yet, so there’s time to get this right.

If what this does seems rudimentary, it is not: The spec algorithm for determining when negotiation is needed is actually quite complicated. For instance, pc.addTrack(track) behaves unpredictably when you’re the answerer and your signaling state is not "stable". Depending on how many unused transceivers there were available at that time, a second round of negotiation may or may not be needed once we get back to "stable" state. Would your manual negotiation code handle that?

Demo time!

Yes, there’s a fiddle! You can stop reading here if you don’t care how it works, or just run it and play with the checkboxes in it. You’ll need to run it in Firefox, preferably Nightly, since Chrome doesn’t support this yet.

To have realistic-looking code in terms of signaling, we have one peer in an iframe, and the other peer outside it, using postMessage for signaling.

Click the “Result” tab and toggle the various check boxes to add or remove media to or from the peer connection objects on either end (inside the iframe or outside it). You should see videos start to move in response on the other end from where you’re clicking. Try to find the bugs (there are some)!

Once we skip past the button setup, there are some button-click handlers to add/remove media. Scrolling down, we’ll recognize the code we’ve covered already, plus a white-noise generator. There’s also a data channel in here as well, which we use to implement the “Both noise” checkbox. Pressing this checkbox causes changes—and negotiation—to happen from both ends at nearly the same time. Most of the fun races (and bugs) are there. We’re working on it. Cheers!

Tags