Enabling Real-Time Communication on the Web Platform
October 19, 2011
Mozilla’s manifesto describes the internet as an integral part of modern life and a key component in communication. However, communication on the web has far to go before it’s as rich as face-to-face communication. Real-time video communication on the web should be easy, rich, and readily available to developers in a way that proprietary formats can’t be.
That’s why a new project is spinning up at Mozilla called WebRTC (Real-Time Communication). WebRTC will allow developers to use the web platform to include video and audio conferencing as part of their websites and applications, both mobile and on the desktop. In its first phase, WebRTC will make webcam feeds a primary object in the browser, allowing sites to create rich interactions such as video calling and conferencing. In later phases, WebRTC will allow interactions like co-browsing, in which users can share their screen with a friend.
Privacy and Security
Privacy and security are major concern in enabling open video communication on the web. A face and voice are two of the most identifiable kinds of shareable data, and keeping users in absolute control of who has access to them is vital. As the IETF states in its WebRTC draft document, the ability for users to control access to their webcam, be able to cancel communication at any time, and not be eavesdropped upon are essential.
Even a trusted site could be compromised, both during a call or after. And, since the sites themselves would control and display the UI of the call itself, Firefox needs to give the user both constant indication that they are in a call and the ability to disconnect at any time.
However, guarding against threats only goes so far towards keeping users in control of their webcam communication. Clear messaging, useful tools, and sensible defaults need to be in place for video conferencing to safely take root in the browser.
The first phase of enabling WebRTC will allow the most basic use case: giving a site access to a user’s webcam and microphone. The browser already serves as a mediator for other user data, such as location and access to cookies. Firefox usually asks for permissions using a door hanger notification. Door hangers stem from the URL bar to show the site is asking for a permission, and it extends past the content area to show that Firefox is the mediator of the permission request. Using a door hanger notification for WebRTC is both consistent within Firefox and correctly conveys visually that the site has requested access, and Firefox is asking the user for that permission.
Usually, these door hangers simply ask the user for a permission, and in a click the user can give it. However, webcam access requires a secondary stage: showing a preview of the webcam feed. This approach has three benefits:
- It gives users the ability to make sure their webcam and microphone work correctly
- If users had casually or accidentally accepted the webcam permission, nothing makes people more aware of what they’re about to transmit like showing them their own grubby mug
- It gives users the ability to fix their hair/put on a shirt/remove incriminating items from background before beginning call
In some ways, it’s unfortunate to ask users to pass through two dialogs to give webcam feed rather than one. After all, in most cases the site itself will be providing all necessary UI, and perhaps even a video preview before a call is initiated. So, this could all be redundant in many cases. However, we cannot predict what purpose a site may be requesting webcam feed for, nor what UI will be in place for the user on that page. Even with all our efforts against security threats, any request for webcam access must be treated as potentially malicious.
Once a user has given a site access to their webcam and is likely engaging in face-to-face communication, that interaction should be given a heightened level of priority within the browser. For a user to lose that tab or forget they are broadcasting could range from mildly embarrassing to, well, use your imagination. If a user is actively sharing their webcam feed, they should be able to jump to the tab where data’s being shared or simply cut their webcam feed from anywhere within Firefox. This will require at the very least a toolbar-level Firefox control that appears once a user’s actively sharing.
Designing and implementing a new API is always a complex process. If you’re interested in reading more or contributing to this project, here are some resources:
- Mozilla WebRTC feature page
- Mozilla notes on first WebRTC security discussion
- The IETF’s draft document on WebRTC Use-cases and Requirements
- Robert O’Callahan MediaStream Processing API Proposal
- Mozilla’s RTC API Proposal on GitHub and on Web Activities, a service discovery mechanism and light-weight RPC system between web apps and browsers
- Eric Rescorla’s paper on WebRTC Security Considerations, and his corresponding presentation slides (PDF)
- Cullen Jennings’s PDF slides on WebRTC API Design Questions
- W3C WebRTC meeting notes, including a PDF of Mozilla’s implementation status