Designing for voice

In the future people will use their voice to access the internet as often as they use a screen. We’re already in the early stages of this trend: As of 2016 Google reported 20% of searches on mobile devices used voice, last year smart speakers sales topped 146 million units — a 70% jump from 2018, and I’m willing to bet your mom or dad have adopted voice to make a phone call or dictate a text message.

I’ve been exploring voice interactions as the design lead for Mozilla’s Emerging Technologies team for the past two years. In that time we’ve developed Pocket Listen (a Text-to-Speech platform, capable of converting any published web article into audio) and Firefox Voice (an experiment accessing the internet with voice in the browser). This blog post is an introduction to designing for voice, based on the lessons our team learned researching and developing these projects. Luckily, if you’re a designer transitioning to working with voice, and you already have a solid design process in place, you’ll find many of your skills transfer seamlessly. But, some things are very different, so let’s dive in.

The benefits of voice

As with any design it’s best to ground the work in the value it can bring people.

The accessibility benefits to a person with a physical impairment should be clear, but voice has the opportunity to aid an even larger population. Small screens are hard to read with aging eyes, typing on a virtual keyboard can be difficult, and understanding complex technology is always a challenge. Voice is emerging as a tool to overcome these limitations, turning cumbersome tasks into simple verbal interactions.

How voice technology can improve the user experience?

As designers, we’re often tasked with creating efficient and effortless interactions. Watch someone play music on a smart speaker and you’ll see how quickly thought turns to action when friction is removed. They don’t have to find and unlock their phone, launch an app, scroll through a list of songs and tap. Requesting a song happens in an instant with voice. A quote from one of our survey respondents summed it up perfectly:

“Being able to talk without thinking. It’s essentially effortless information ingestion.“

When is voice valuable?

When and where voice is likely to be used

Talking out loud to a device isn’t always appropriate or socially acceptable. We see this over and over again in research and real world usage. People are generally uncomfortable talking to devices in public. The more private, the better.

Graph showing Home, In the car, and At a friends house being the top 3 places people are comfortable using voice.

Hands-free and multi-tasking also drive voice usage — cooking, washing the dishes, or driving in a car. These situations present opportunities to use voice because our hands or eyes are otherwise occupied.

But, voice isn’t just used for giving commands. Text-to-Speech can generate content from anything written, including articles. It’s a technology we successfully used to build and deploy Pocket Listen, which allows you to listen to articles you’d saved for later.

Pocket Listen usage Feb 2020, United Kingdom

In the graph above you’ll see that people primarily use Pocket Listen while commuting. By creating a new format to deliver the content, we’ve expanded when and where the product provides value.

Why is designing for voice hard?

Now that you know ‘why’ and ‘when’ voice is valuable, let’s talk about what makes it hard. These are the pitfalls to watch for when building a voice product.

What’s hard about designing for voice?

Voice is still a new technology, and, as such, it can feel open ended. There’s a wide variety of uses and devices it works well with. It can be incorporated using input (Speech-to-Text) or output (Text-to-Speech), with a screen or without a screen. You may be designing with a “Voice first mindset” as Amazon recommends for the Echo Show, or the entire experience might unfold while the phone is buried in someone’s pocket.

In many ways, this kind of divergence is familiar if you’ve worked with mobile apps or responsive design. Personally, the biggest adjustment for me has been the infinite nature of voice. The limited real estate of a screen imposes constraints on the number and types of interactions available. With voice, there’s often no interface to guide an action and it’s more personal than a screen, so request and utterance vary greatly by personality and culture.

In a voice user interface, a person can ask anything and they can ask it a hundred different ways. A list is a great example: on a screen it’s easy to display a handful of options. In a voice interface, listing more than two options quickly breaks down. The user can’t remember the first choice or the exact phrasing they should say if they want to make a selection.

Which brings us to discovery — often cited as the biggest challenge facing voice designers and developers. It’s difficult for a user to know what features are available, what can they say, how do they have to say it? It becomes essential to teach a systems capabilities but difficult in practice. Even when you teach a few key phrases early in the experience, human recall of proper voice commands and syntax is limited. People rarely remember more than a few phrases.

The exciting future of voice

It’s still early days for voice interactions and while the challenges are real, so are the opportunities. Voice brings the potential to deliver valuable new experiences that improve our connections to each other and the vast knowledge available on the internet. These are just a few examples of what I look forward to seeing more of:

“I like that my voice is the interface. When the assistant works well, it lets me do what I wanted to do quickly, without unlocking my phone, opening an app / going on my computer, loading a site, etc.“

As you can see, we’re at the beginning of an exciting journey into voice. Hopefully this intro has motivated you to dig deeper and ask how voice can play a role in one of your projects. If you want to explore more, have questions or just want to chat feel free to get in touch.