How WebRTC speaker selection works

July 1, 2024

How WebRTC speaker selection works

Contributed by Jan-Ivar Bruaroey,

Today’s topic is speaker selection. If you’ve never given it much thought, or consider it straightforward, this post is for you. Let’s start by testing your general WebRTC knowledge:

Pop Quiz: If a website wants to play out of different speakers on your system, what permission must it have?

1. Speaker-selection permission
2. Microphone permission

If you answered 2 then chances are you know your WebRTC stuff well, but you’re probably on a Chromium browser.

Pause for a moment. Why would websites need microphone permission to control speaker output? Why should users expose themselves to possibly being recorded just to redirect songs to their portable speakers? They shouldn’t. It’s a permission escalation and an entirely unnecessary invasion of privacy. Nevertheless, in other browsers this remains the only answer today.

Thankfully, Firefox provides the more straightforward and privacy-preserving answer of 1.

In this post, we’ll look at how speaker selection works in different browsers, how permissions can be abused for fingerprinting purposes, and how to add speaker selection with a fallback so it works in all browsers.

How speaker selection works in Firefox (and the spec)

On desktop, Firefox supports the standards-track navigator.mediaDevices.selectAudioOutput() API. You only need a <button> to invoke it:

speakers.onclick = async () => {
  const info = await navigator.mediaDevices.selectAudioOutput();
  await videoElement.setSinkId(info.deviceId);
  speakers.innerText = `${info.label}...`;
  localStorage.previouslyUsedSpeakersId = info.deviceId;
}

Passing in a previously used deviceId you wish to use is optional, but allows you to skip a prompt in most cases.

If you’re in Firefox right now, you can put on your headset and try it here:

<span style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" data-mce-type="bookmark" class="mce_SELRES_start"></span><span style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" data-mce-type="bookmark" class="mce_SELRES_start"></span>
Pressing the button brings up Firefox’s speaker picker and updates to reflect your choice. Press the reset button that appears on the page to go back to the dynamic OS default. For non-Firefox users reading this article, or if you’re on your phone, here’s what this looks like in Firefox on desktop:

Two Firefox windows. The first showing its built-in speaker picker drop-down with 3 speaker choices: AirPods, Macbook Pro Speakers and Marley Get Together. The second window is after the prompt, and shows two buttons: Speakers: Marley Get Together, and a reset button
And yes, my external Bluetooth speakers really are named “Marley Get Together”!

Users can sometimes change their speakers through the OS as well, but each website decides how to handle this. I will continue to hear the demo page playing through ‘Marley Get Together’ until I press the reset button. At that point it goes back to the current OS default. This is illustrative of the API. The demo page stores your choice in local storage. Note that Firefox has a bug where it doesn’t yet fire a devicechange event when the OS default speakers for Firefox are changed.

Strong privacy characteristics

Notably, this demo is from a different origin and runs here in an iframe with allow="speaker-selection" as the only permission policy delegated to it. It therefore cannot listen to you while you’re reading this. I wish it were hypothetical, but this blog site quite possibly has access to your microphone already from previous posts you’ve interacted with — that is, unless you’re a first-time reader, in which case welcome, and never mind! — note, Firefox does not persist permission unless you ask it to, but some other browsers do.

Moreover, the API adheres to the W3C Privacy Interest Group and W3C Technical Architecture Group‘s design principle of not exposing device information of unused devices.

This seems like a slam dunk API. So, why hasn’t it been broadly implemented yet? Let’s look at that.

The microphone loophole

Unfortunately, cameras and microphones (which are about input not output) are exposed through an older API. That API does not align with modern privacy design principles for the Web. The data support this: 7.2% of the web calls the enumerateDevices() API, greatly exceeding its legitimate use at 0.2%. This highlights significant privacy concerns with 7% likely being web-trackers attempting to fingerprint users! It’s a whole other blog post.

But what’s relevant here is a misunderstood microphone loophole where devices that act as both speakers and microphone must be exposed as both in enumerateDevices() upon live microphone access by the website. This allows for headset detection and lets websites promote full duplex audio if they wish. For example, a website might wish to auto-switch to AirPods speakers whenever the user uses the AirPods microphone, and only then. This synchronization puts everything on the same device clock, improving echo.

But this loophole doesn’t work for speaker selection, because not all speakers have an associated microphone. Users would consider any website-built picker that only shows a subset of their speakers terrible (this is real, we get bug reports about it).

Even if we changed the spec to expose all speakers through the loophole, audience members would still need to grant microphone access even if they never intend to speak, a burdensome requirement.

~~For these reasons we recommend ignoring the microphone loophole for speaker selection. See below what we recommend instead.~~

How do other browsers manage speakers today?

How are the other browsers making do without a speaker selection API? The answer is different for each browser.

Safari doesn’t implement speaker switching (videoElement.setSinkId()) at all. It instead relies on the robust OS-level speaker selection offered by macOS and iOS. This might be great for privacy in Safari, but Firefox needs a solution that works on Windows, Linux, and Android as well.

Chrome iswas unique in exposing all speakers through the microphone loophole violating the spec, and also interpreting allow="microphone" as an implicit allow="speaker-selection" (likely for lack of support for the latter).

But that seems like a dead-end. Once Chrome fixes crbug 40138537 to tighten exposure to active use instead of just permission, websites will need to actually turn on the microphone to select speakers.

This is a sad state of affairs for web developers, necessitating a different approach for each browser. Worse, many just copy what they do for microphones since this works in the dominant browser today.

As a result, this puts Firefox in a tough spot as the only browser without working speaker selection on many sites on some platforms, even though it respects user privacy by following the spec. For this reason, we’re reaching out to web developers. Your assistance can make a significant difference!

UPDATE 7/03/2025: Mozilla relented, and Firefox 140+ exposes all speakers upon microphone access for compatibility with other browsers. However, this doesn’t diminish the usefulness of selectAudioOutput() outside of video conferencing. So we look forward to the other browsers supporting it!

Making it work in all browsers

We’ll now demonstrate speaker selection working in all browsers. There’s no shim, but if you’ve already done this in chromium, then the rest should be relatively simple.

We encourage websites to feature-detect the new API with a fallback to the old API. Like this (HTML):

Speakers:
<span id="newapi">
  <button id="speakers1">Default system output...</button>
  <button id="reset" hidden>Reset</button>
</span>
<span id="oldapi" style="display: none;">
  <select id="speakers2">
    <option value="">Default system output</option>
  </select>
</span>

Feature detection is then done like this in JS:

if (!("selectAudioOutput" in navigator.mediaDevices)) {
  newapi.style.display = "none";
  oldapi.style.display = "inline";
}

We won’t cover populating the <select> from the old API — it is similar to microphones — and we covered the new API already. Note, if you want to implement memory like this demo does, have a look at its source for more details.

This results in a demo that works in all browsers:

In Firefox, you’ll enjoy the same great speaker selection experience as above
In Chrome or Edge, you’ll see speaker choices listed the same way microphones are listed
In Safari… well, then you’re on macOS or iOS where this is easily done in the OS instead

Please consider applying this to your website. It will help speaker selection work in more browsers. Other vendors will eventually implement this, so you’ll be future-proofing your website.

I hope you enjoyed this post and found it helpful. I’d love to hear your feedback or thoughts. If you have comments or questions, you can reach me on X. See you on the web!