For Mozilla’s Project Common Voice, Kelly Davis collects the largest possible array of voices. For a future in which everyone can talk to computers and not just a few. What does a future require, to be a good one? Does innovation always mean that we have to abandon the stuff we used to value?
Few have the gift of foreseeing the outlines of the future – as a space for aspiration to bloom, and where values and remembrance coexist. Long before Siri broke into our conversations or Amazon’s Echo eavesdropped on our domestic secrets, Kelly Davis, Manager of the Machine Learning Group at Mozilla, had a clear vision of how we could speak to computers in a way that would benefit all.
We meet Kelly at the Teehaus of the Englischer Garten at Berlin’s Botanical Gardens, one of his favorite spots. It’s also a place with history. The Teehaus sits on the foundations of the house owned by German theater legend Gustav Gründgens. It was built in 1952 to celebrate the inauguration of the English Garden, which is itself a homage to British-German collaboration during the Berlin blockade. It’s early, and the tea house is closed, so we set off for a walk. Soft haze touches the regiments of flowers.
values all human beings
There are other elegant consistencies Kelly admires as much as the beauty of these gardens. They have been a constant in a life devoted to creating a future that values all human beings, rather than chasing ephemeral thrills. His style, a nod to the Roaring Twenties with a flat cap, suspenders and highly polished spectacles, is one of these constants. “It’s a style that suits me. I just like it.”
Having the freedom to be who you are. Being aware that notions like participation, diversity and opportunity are shared human rights among all individuals. That’s the thread that runs through all Kelly’s work.
Project Common Voice, run by Kelly and his team at Mozilla with the help of a host of voluntary contributors, aims to rethink and redefine the rules of speech recognition technologies towards a more ethical approach to the world.
A speech recognition project against monopolism
This ethical sense had been stirring in Kelly for ages. In the early years of the millennium, Kelly founded a start-up with a friend to develop speech recognition technology to make it easier to search the web on mobile devices. Before that they both worked at a start-up that wrote dialog systems which could converse naturally with their users. But lack of funding scuppered the project. An ever-present danger because, in order to develop such technologies, you need massive amounts of data. Data that programmers are feeding into machines so that all that circuitry can understand our inquiries and spit out great answers. For Kelly, “One of the key issues is that all the big companies are holding the data, they silo it, and all small companies have, besides paying the bigger ones, no way in.”
Thousands of volunteers contribute to the project. Together they are creating a recognizable and diverse voice against this kind of monopolism. Project Common Voice opens up more perspectives on the process of developing speech recognition technologies. That’s one of the main assets in creating a technology that understands and respects users’ needs.
Not-for-profit technology by Mozilla making everybody understood
It is only when the devotion for technology is not profit-based, but build on the pure desire to create a good solution for everybody, that speech recognition technologies can really gain their full potential in the future. That is what the Mozilla Project is fighting for. To reach this goal, Mozilla brings all the various dialects and accents of its huge community together and creates a database, that can build technologies which are able to understand and be understood by an incredible variety of humans. Something, that no other companies are trying to reach.
“When I joined Mozilla 2,5 years ago, I thought: Wow, Mozilla is in a unique position,” Kelly says. “Mozilla has a whole huge community with different users, with different accents and different languages that they speak. And no one else is going to be collecting these data. No other player would have the motivation to collect this data and make it public.”
Speech recognition that can provide home safety for the elderly
The enormous positive feedback the Common Voice team receives daily from volunteers and the community illustrates a shared view that drives all the participants. To create a future that’s welcoming for every person doesn’t just call for foresight but also collaboration, exchange of thoughts and the will to make things happen. Speech technologies, for instance, crafted from collaboration and multiple perspectives, can be a boon for our common future. “Home hubs can provide a safety net for older people who would like to live at home instead of in an old people’s home. Speech synthesis technologies can be used to give a voice to those who’ve lost theirs, say as a result of cancer. For example, the demographic information contributors supply to Common Voice can be used to create a voice that sounds similar to the voice that someone has lost.”