If I told you that two weeks ago IETF and W3C finally published the standards for WebRTC, your response would probably be to ask what all those acronyms were. Read on to find out!
Widely available high quality videoconferencing is one of the real successes of the Internet. The idea of videoconferencing is of course old (go watch that scene in 2001 where Heywood Floyd makes a video call to his family on a Bell videophone), but until fairly recently it required specialized equipment or at least downloading specialized software. Simply put, WebRTC is videoconferencing (VC) in a Web browser, with no download: you just go to a Web site and make a call. Most of the major VC services have a WebRTC version: this includes Google Meet, Cisco WebEx, and Microsoft Teams, plus a whole bunch of smaller players.
A toolkit, not a phone
WebRTC isn’t a complete videoconferencing system; it’s a set of tools built in to the browser that take care of many of the hard pieces of building a VC system so that you don’t have to. This includes:
- Capturing the audio and video from the computer’s microphone and camera. This also includes what’s called Acoustic Echo Cancellation: removing echos (hopefully) even when people don’t wear headphones.
- Allowing the two endpoints to negotiate their capabilities (e.g., “I want to send and receive video at 1080p using the AV1 codec”) and arrive at a common set of parameters.
- Establishing a secure connection between you and other people on the call. This includes getting your data through any NATs or firewalls that may be on your network.
- Compressing the audio and video for transmission to the other side and then reassembling it on receipt. It’s also necessary to deal with situations where some of the data is lost, in which case you want to avoid having the picture freeze or hearing audio glitches.
This functionality is embedded in what’s called an application programming interface (API): a set of commands that the programmer can give the browser to get it to set up a video call. The upshot of this is that it’s possible to write a very basic VC system in a very small number of lines of code. Building a production system is more work, but with WebRTC, the browser does much of the work of building the client side for you.
Standardization
Importantly, this functionality is all standardized: the API itself was published and by the World Wide Web Consortium(W3C) and the network protocols (encryption, compression, NAT traversal, etc.) were standardized by the Internet Engineering Task Force (IETF). The result is a giant pile of specifications, including the API specification, the protocol for negotiating what media will be sent or received, and a mechanism for sending peer-to-peer data. All in all, this represents a huge amount of work by too many people to count spanning a decade and resulting in hundreds of pages of specifications.
The result is that it’s possible to build a VC system that will work for everyone right in their browser and without them having to install any software
Ironically, the actual publication of the standards is kind of anticlimactic: every major browser has been shipping WebRTC for years and as I mentioned above, there are a large number of WebRTC VC systems. This is a good thing: widespread deployment is the only way to get confidence that technologies really work as expected and that the documents are clear enough to implement from. What the standards reflect is the collective judgement of the technical community that we have a system which generally works and that we’re not going to change the basic pieces. It also means that it’s time for VC providers who implemented non-standard mechanisms to update to what the standards say[1].
Why do you care about any of this?
At this point you might be thinking “OK, you all did a lot of work, but why does it matter? Can’t I just download Zoom? There are a number of important reasons why WebRTC is a big deal, as described below.
Security
Probably the most important reason is security. Because WebRTC runs entirely in the browser, it means that you don’t need to worry about security issues in the software that the VC provider wants you to download. As an example, last year Zoom had a number of high profile security flaws that would, for instance, have allowed web sites to add you to calls without your permission, or mount what’s called a Remote Code Execution attack that would allow attackers to run their code on your computer. By contrast, because WebRTC doesn’t require a download, you’re not exposed to whatever vulnerabilities the vendor may have in their client. Of course browsers don’t have a perfect security record, but every major browser invests a huge amount in security technologies like sandboxing. Moreover, you’re already running a browser, so every additional application you run increases your security risk. For this reason, Kaspersky recommends running the Zoom Web client, even though the experience is a lot worse than the app.[2]
The second security advantage of WebRTC-based conferencing is that the browser controls access to the camera and microphone. This means that you can easily prevent sites from using them, as well as be sure when they are in use. For instance, Firefox prompts you before letting a site use the camera and microphone and then shows something in the URL bar whenever they are live.
WebRTC is always encrypted in transit without the VC system having to do anything else, so you mostly don’t have to ask whether the vendor has done a good job with their encryption. This is one of the pieces of WebRTC that Mozilla was most involved in putting into place, in line with Mozilla Manifesto principle number 4 (Individuals’ security and privacy on the internet are fundamental and must not be treated as optional.). Even more exciting, we’re starting to see work on built-in end-to-end encrypted conferencing for WebRTC built on MLS and SFrame. This will help address the one major security feature that some native clients have that WebRTC does not provide: preventing the service from listening in on your calls. It’s good to see progress on that front.
Low Friction
Because WebRTC-based video calling apps work out of the box with a standard Web browser, they dramatically reduce friction. For users, this means they can just join a call without having to install anything, which makes life a lot easier. I’ve been on plenty of calls where someone couldn’t join — often because their company used a different VC system — because they hadn’t downloaded the right software, and this happens a lot less now that it just works with your browser. This can be an even bigger issue in enterprises have restrictions on what software can be installed.
For people who want to stand up a new VC service, WebRTC means that they don’t need to write a new piece of client software and get people to download it. This makes it much easier to enter the market without having to worry about users being locked into one VC system and unable to use yours.
None of this means that you can’t build your own client and a number of popular systems such as WebEx and Meet have downloadable endpoints (or, in the case of WebEx, hardware devices you can buy). But it means you don’t have to, and if you do things right, browser users will be able to talk to your custom endpoints, thus giving casual users an easy way to try out your service without being too committed.[3]
Enhancing The Web
Because WebRTC is part of the Web, not isolated into a separate app, that means that it can be used not just for conferencing applications but to enhance the Web itself. You want to add an audio stream to your game? Share your screen in a webinar? Upload video from your camera? No problem, just use WebRTC.
One exciting thing about WebRTC is that there turn out to be a lot of Web applications that can use WebRTC besides just video calling. Probably the most interesting is the use of WebRTC “Data Channels”, which allow a pair of clients to set up a connection between them which they can use to directly exchange data. This has a number of interesting applications, including gaming, file transfer, and even BitTorrent in the browser. It’s still early days, but I think we’re going to be seeing a lot of DataChannels in the future.
The bigger picture
By itself, WebRTC is a big step forward for the Web: it If you’d told people 20 years ago that they would be doing video calling from their browser, they would have laughed at you — and I have to admit, I was initially skeptical — and yet I do that almost every day at work. But more importantly, it’s a great example of the power the Web has to make to make people’s lives better and of what we can do when we work together to do that.
- Technical note: probably the biggest source of problems for Firefox users is people who implemented a Chrome-specific mechanism for handling multiple media streams called “Plan B”. The IETF eventually went with something called “Unified Plan” and Chrome supports it (as does Google Meet) but there are still a number of services, such as Slack and Facebook Video Calling, which do Plan B only which means they don’t work properly with Firefox, which implemented Unified Plan. ↩︎
- The Zoom Web client is an interesting case in that it’s only partly WebRTC. Unlike (say) Google Meet, Zoom Web uses WebRTC to capture audio and video and to transmit media over the network, but does all the audio and video locally using WebAssembly. It’s a testament to the power of WebAssembly that this works at all, but a head-to-head comparison of Zoom Web to other clients such as Meet or Jitsi reveals the advantages of using the WebRTC APIs built into the browser. ↩︎
- Google has open sourced their WebRTC stack, which makes it easier to write your own downloadable client, including one which will interoperate with browsers. ↩︎