October 19, 2011
Mozilla’s manifesto describes the internet as an integral part of modern life and a key component in communication. However, communication on the web has far to go before it’s as rich as face-to-face communication. Real-time video communication on the web should be easy, rich, and readily available to developers in a way that proprietary formats can’t be.
That’s why a new project is spinning up at Mozilla called WebRTC (Real-Time Communication). WebRTC will allow developers to use the web platform to include video and audio conferencing as part of their websites and applications, both mobile and on the desktop. In its first phase, WebRTC will make webcam feeds a primary object in the browser, allowing sites to create rich interactions such as video calling and conferencing. In later phases, WebRTC will allow interactions like co-browsing, in which users can share their screen with a friend.
Privacy and Security
Privacy and security are major concern in enabling open video communication on the web. A face and voice are two of the most identifiable kinds of shareable data, and keeping users in absolute control of who has access to them is vital. As the IETF states in its WebRTC draft document, the ability for users to control access to their webcam, be able to cancel communication at any time, and not be eavesdropped upon are essential.
Even a trusted site could be compromised, both during a call or after. And, since the sites themselves would control and display the UI of the call itself, Firefox needs to give the user both constant indication that they are in a call and the ability to disconnect at any time.
However, guarding against threats only goes so far towards keeping users in control of their webcam communication. Clear messaging, useful tools, and sensible defaults need to be in place for video conferencing to safely take root in the browser.
The first phase of enabling WebRTC will allow the most basic use case: giving a site access to a user’s webcam and microphone. The browser already serves as a mediator for other user data, such as location and access to cookies. Firefox usually asks for permissions using a door hanger notification. Door hangers stem from the URL bar to show the site is asking for a permission, and it extends past the content area to show that Firefox is the mediator of the permission request. Using a door hanger notification for WebRTC is both consistent within Firefox and correctly conveys visually that the site has requested access, and Firefox is asking the user for that permission.
Usually, these door hangers simply ask the user for a permission, and in a click the user can give it. However, webcam access requires a secondary stage: showing a preview of the webcam feed. This approach has three benefits:
- It gives users the ability to make sure their webcam and microphone work correctly
- If users had casually or accidentally accepted the webcam permission, nothing makes people more aware of what they’re about to transmit like showing them their own grubby mug
- It gives users the ability to fix their hair/put on a shirt/remove incriminating items from background before beginning call
In some ways, it’s unfortunate to ask users to pass through two dialogs to give webcam feed rather than one. After all, in most cases the site itself will be providing all necessary UI, and perhaps even a video preview before a call is initiated. So, this could all be redundant in many cases. However, we cannot predict what purpose a site may be requesting webcam feed for, nor what UI will be in place for the user on that page. Even with all our efforts against security threats, any request for webcam access must be treated as potentially malicious.
Once a user has given a site access to their webcam and is likely engaging in face-to-face communication, that interaction should be given a heightened level of priority within the browser. For a user to lose that tab or forget they are broadcasting could range from mildly embarrassing to, well, use your imagination. If a user is actively sharing their webcam feed, they should be able to jump to the tab where data’s being shared or simply cut their webcam feed from anywhere within Firefox. This will require at the very least a toolbar-level Firefox control that appears once a user’s actively sharing.
Designing and implementing a new API is always a complex process. If you’re interested in reading more or contributing to this project, here are some resources:
- Mozilla WebRTC feature page
- Mozilla notes on first WebRTC security discussion
- The IETF’s draft document on WebRTC Use-cases and Requirements
- Robert O’Callahan MediaStream Processing API Proposal
- Mozilla’s RTC API Proposal on GitHub and on Web Activities, a service discovery mechanism and light-weight RPC system between web apps and browsers
- Eric Rescorla’s paper on WebRTC Security Considerations, and his corresponding presentation slides (PDF)
- Cullen Jennings’s PDF slides on WebRTC API Design Questions
- W3C WebRTC meeting notes, including a PDF of Mozilla’s implementation status
October 6, 2011
“No one wants to die. Even people who want to go to heaven don’t want to die to get there. And yet death is the destination we all share. No one has ever escaped it. And that is as it should be, because Death is very likely the single best invention of Life. It is Life’s change agent. It clears out the old to make way for the new. Right now the new is you, but someday not too long from now, you will gradually become the old and be cleared away. Sorry to be so dramatic, but it is quite true.
Your time is limited, so don’t waste it living someone else’s life. Don’t be trapped by dogma — which is living with the results of other people’s thinking. Don’t let the noise of others’ opinions drown out your own inner voice. And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary.”
February 24, 1955 – October 5, 2011
The Mozilla user experience team often designs features that represent sites to users in a variety of ways. For example, Firefox tabs display favicons and page titles, while Panorama displays favicons, titles, and page thumbnails. So, I thought it would be useful to investigate the effectiveness of various ways of representing sites to users.
One interesting piece of research on page representation was published by Shaun Kaasten, Saul Greenberg, and Christopher Edwards at the University of Calgary in their paper How People Recognize Previously Seen Web Pages from Titles, URLs and Thumbnails (download it here). This team conducted a series of studies, most of which involved increasing one variable which represented a site the user had previously visited (such as thumbnail size) until the user recognized it, at which point the user would buzz in to stop the expansion and identify the site.
Here’s some key takeaways from what the Canadians learned:
– The graph above plots the thumbnail sizes at which test participants could recognize a domain (black lines) and a specific page within a domain (blue lines). The dotted lines show all responses, and the solid lines show only correct responses. You can see that by the time a thumbnail was 962 pixels, 60% of test subjects had identified it. 80% of test subjects identified sites by 1442 pixels, and by 3042 pixels everyone had identified the site.
– Users’ guesses about what site a thumbnail was representing were correct about 90% of the time. Not bad, considering on most sites they had no readable text to go by until the thumbnail was over 962 pixels. This shows how effective thumbnails are at identifying sites to users.
– Color and layout in were the most important factors for identifying a site when the thumbnail was 642 pixels and smaller. From 642 to 962 pixels, color, layout, images, and text were equally important. Above 1002 pixels, text was most important. This is presumably because at that size, sites were not yet identified because they were visually similar to other sites and text was the only effective differentiator.
– Looking at only truncated URLs and page titles, test subjects could correctly identify sites 90% of the time. The researchers experimented with URL and title representation by showing users right, middle, and left truncated strings and recording when they buzzed in to identify the site correctly.
– The graph above shows the running sum of correct answers in identifying sites based on only page title (top graph) and URL (bottom graph). You can see that right truncation proved the most effective for domain-level site identification. For titles and URLS that were truncated on the right, sites were correctly identified 15% of the time with 5-6 characters revealed, 30% of the time with 8 characters, 60% of the time with 13-15 characters, and 80% of the time with 25-31 characters. Left truncation was the most effective for identifying a specific site within a domain. So, if you want users to identify a site based on a string, at least 15ish characters are needed for even a majority. If you want users to identify a subdomain, clip right left side of the URL. To idenfiy the domain itself, clip the right.