{"id":62389,"date":"2017-11-29T00:00:00","date_gmt":"2017-11-29T00:00:00","guid":{"rendered":"http:\/\/blog.mozilla.org\/foxtail\/2017\/11\/29\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/"},"modified":"2021-02-08T20:33:37","modified_gmt":"2021-02-08T20:33:37","slug":"announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/","title":{"rendered":"Announcing the Initial Release of Mozilla\u2019s Open Source Speech Recognition Model and Voice Dataset"},"content":{"rendered":"<p>With the holiday, gift-giving season upon us, many people are about to experience the ease and power of new speech-enabled devices. Technical advancements have fueled the growth of speech interfaces through the availability of machine learning tools, resulting in more Internet-connected products that can listen and respond to us than ever before.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-11080\" src=\"https:\/\/blog.mozilla.org\/wp-content\/uploads\/2017\/11\/CommonVoiceAppThanks.jpg\" alt=\"\" width=\"1943\" height=\"1306\" srcset=\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceAppThanks.jpg 1943w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceAppThanks-300x202.jpg 300w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceAppThanks-1024x688.jpg 1024w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceAppThanks-768x516.jpg 768w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceAppThanks-1536x1032.jpg 1536w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceAppThanks-1000x672.jpg 1000w\" sizes=\"(max-width: 1943px) 100vw, 1943px\" \/><\/p>\n<p>At Mozilla we\u2019re <a href=\"https:\/\/blog.mozilla.org\/blog\/2017\/07\/28\/machine-learning-speech-recognition\/\">excited about the potential of speech recognition<\/a>. We believe this technology can and will enable a wave of innovative products and services, and that it should be available to everyone.<\/p>\n<p>And yet, while this technology is still maturing, we\u2019re seeing significant barriers to innovation that can put people first. These challenges inspired us to launch Project DeepSpeech and Project Common Voice. Today, we have reached two important milestones in these projects for the speech recognition work of our <a href=\"https:\/\/research.mozilla.org\/machine-learning\/\">Machine Learning Group<\/a> at Mozilla.<\/p>\n<p>I\u2019m excited to announce the<a href=\"https:\/\/hacks.mozilla.org\/2017\/11\/a-journey-to-10-word-error-rate\/\"> initial release of Mozilla\u2019s open source speech recognition model<\/a> that has an accuracy approaching what humans can perceive when listening to the same recordings. We are also <a href=\"https:\/\/medium.com\/mozilla-open-innovation\/sharing-our-common-voice-mozilla-releases-second-largest-public-voice-data-set-e88f7d6b7666\">releasing the world\u2019s second largest publicly available voice dataset<\/a>, which was contributed to by nearly 20,000 people globally.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-11169 size-full\" src=\"https:\/\/blog.mozilla.org\/wp-content\/uploads\/2017\/11\/MZ_CommonVoice_blog_post_LR.jpg\" alt=\"\" width=\"1134\" height=\"638\" srcset=\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/MZ_CommonVoice_blog_post_LR.jpg 1134w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/MZ_CommonVoice_blog_post_LR-300x169.jpg 300w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/MZ_CommonVoice_blog_post_LR-1024x576.jpg 1024w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/MZ_CommonVoice_blog_post_LR-768x432.jpg 768w, https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/MZ_CommonVoice_blog_post_LR-1000x563.jpg 1000w\" sizes=\"(max-width: 1134px) 100vw, 1134px\" \/><b>An open source speech-to-text engine approaching user-expected performance<\/b><\/p>\n<p>There are only a few commercial quality speech recognition services available, dominated by a small number of large companies. This reduces user choice and available features for startups, researchers or even larger companies that want to speech-enable their products and services.<\/p>\n<p>This is why we started <a href=\"https:\/\/github.com\/mozilla\/DeepSpeech\">DeepSpeech<\/a> as an open source project. Together with a community of likeminded developers, companies and researchers, we have applied sophisticated machine learning techniques and a variety of innovations to build a speech-to-text engine that has a word error rate of just 6.5% on LibriSpeech\u2019s test-clean dataset.<\/p>\n<p>In our initial release today, we have included pre-built packages for Python, NodeJS and a command-line binary that developers can use right away to experiment with speech recognition.<\/p>\n<p><b>Building the world\u2019s most diverse publicly available voice dataset, optimized for training voice technologies<\/b><\/p>\n<p>One reason so few services are commercially available is a lack of data. Startups, researchers or anyone else who wants to build voice-enabled technologies need high quality, transcribed voice data on which to train machine learning algorithms. Right now, they can only access fairly limited data sets.<\/p>\n<p>To address this barrier, we launched <a href=\"https:\/\/voice.mozilla.org\/\">Project Common Voice<\/a> this past July. Our aim is to make it easy for people to donate their voices to a publicly available database, and in doing so build a voice dataset that everyone can use to train new voice-enabled applications.<\/p>\n<p>Today, we\u2019ve released the first tranche of donated voices: nearly 400,000 recordings, representing 500 hours of speech. <a href=\"https:\/\/voice.mozilla.org\/data\">Anyone can download this data<\/a>.<\/p>\n<p>What\u2019s most important for me is that our work represents the world around us. We\u2019ve seen contributions from more than 20,000 people, reflecting a diversity of voices globally. Too often existing speech recognition services can\u2019t understand people with different accents, and many are better at understanding men than women &#8212; this is a result of biases within the data on which they are trained. Our hope is that the number of speakers and their different backgrounds and accents will create a globally representative dataset, resulting in more inclusive technologies.<\/p>\n<p>To this end, while we\u2019ve started with English, we are working hard to ensure that Common Voice will support voice donations in multiple languages beginning in the first half of 2018.<\/p>\n<p>Finally, as we have experienced the challenge of finding publicly available voice datasets, alongside the Common Voice data we have also compiled links to download all the other large voice collections we know about.<\/p>\n<p><b>Our open development approach<\/b><\/p>\n<p>We at Mozilla believe technology should be open and accessible to all, and that includes voice. Our approach to developing this technology is <a href=\"https:\/\/medium.com\/mozilla-open-innovation\/being-open-by-design-deec6768706\">open by design<\/a>, and we very much welcome more collaborators and contributors who we can work alongside.<\/p>\n<p>As the web expands beyond the 2D page, into the myriad ways where we connect to the Internet through new means like VR, AR, Speech, and languages, we\u2019ll continue our mission to ensure the Internet is a global public resource, open and accessible to all.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the holiday, gift-giving season upon us, many people are about to experience the ease and power of new speech-enabled devices. Technical advancements have fueled the growth of speech interfaces through the availability of machine learning tools, resulting in more Internet-connected products that can listen and respond to us than ever before. At Mozilla we\u2019re [&hellip;]<\/p>\n","protected":false},"author":1435,"featured_media":11081,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[19596],"coauthors":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Announcing the Initial Release of Mozilla\u2019s Open Source Speech Recognition Model and Voice Dataset<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/\",\"url\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/\",\"name\":\"Announcing the Initial Release of Mozilla\u2019s Open Source Speech Recognition Model and Voice Dataset\",\"isPartOf\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceApp.jpg\",\"datePublished\":\"2017-11-29T00:00:00+00:00\",\"dateModified\":\"2021-02-08T20:33:37+00:00\",\"author\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/0a0eb705e852a8c5b9655e2e9f6dfba4\"},\"breadcrumb\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#primaryimage\",\"url\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceApp.jpg\",\"contentUrl\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceApp.jpg\",\"width\":2082,\"height\":1331},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.mozilla.org\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Announcing the Initial Release of Mozilla\u2019s Open Source Speech Recognition Model and Voice Dataset\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#website\",\"url\":\"https:\/\/blog.mozilla.org\/en\/\",\"name\":\"The Mozilla Blog\",\"description\":\"News and Updates about Mozilla\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.mozilla.org\/en\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/0a0eb705e852a8c5b9655e2e9f6dfba4\",\"name\":\"Sean White\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/image\/7559e8e163a0252b6f9d24151a76ea06\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fe07d32ca76ce2b328f94cad497adf92?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fe07d32ca76ce2b328f94cad497adf92?s=96&d=mm&r=g\",\"caption\":\"Sean White\"},\"url\":\"https:\/\/blog.mozilla.org\/en\/author\/swhitemozilla-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Announcing the Initial Release of Mozilla\u2019s Open Source Speech Recognition Model and Voice Dataset","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/","url":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/","name":"Announcing the Initial Release of Mozilla\u2019s Open Source Speech Recognition Model and Voice Dataset","isPartOf":{"@id":"https:\/\/blog.mozilla.org\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#primaryimage"},"image":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceApp.jpg","datePublished":"2017-11-29T00:00:00+00:00","dateModified":"2021-02-08T20:33:37+00:00","author":{"@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/0a0eb705e852a8c5b9655e2e9f6dfba4"},"breadcrumb":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#primaryimage","url":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceApp.jpg","contentUrl":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2017\/11\/CommonVoiceApp.jpg","width":2082,"height":1331},{"@type":"BreadcrumbList","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.mozilla.org\/en\/"},{"@type":"ListItem","position":2,"name":"Announcing the Initial Release of Mozilla\u2019s Open Source Speech Recognition Model and Voice Dataset"}]},{"@type":"WebSite","@id":"https:\/\/blog.mozilla.org\/en\/#website","url":"https:\/\/blog.mozilla.org\/en\/","name":"The Mozilla Blog","description":"News and Updates about Mozilla","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.mozilla.org\/en\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/0a0eb705e852a8c5b9655e2e9f6dfba4","name":"Sean White","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/image\/7559e8e163a0252b6f9d24151a76ea06","url":"https:\/\/secure.gravatar.com\/avatar\/fe07d32ca76ce2b328f94cad497adf92?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fe07d32ca76ce2b328f94cad497adf92?s=96&d=mm&r=g","caption":"Sean White"},"url":"https:\/\/blog.mozilla.org\/en\/author\/swhitemozilla-com\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts\/62389"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/users\/1435"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/comments?post=62389"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts\/62389\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/media\/11081"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/media?parent=62389"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/categories?post=62389"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/tags?post=62389"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/coauthors?post=62389"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}