{"id":62603,"date":"2020-05-07T00:00:00","date_gmt":"2020-05-07T00:00:00","guid":{"rendered":"http:\/\/blog.mozilla.org\/foxtail\/2020\/05\/07\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/"},"modified":"2021-02-08T20:14:45","modified_gmt":"2021-02-08T20:14:45","slug":"mozilla-research-shows-some-machine-voices-score-higher-than-humans","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/","title":{"rendered":"Mozilla research shows some machine voices score higher than humans"},"content":{"rendered":"<p><i>This blog post is to accompany the publication of the paper <\/i><a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3313831.3376789\"><i>Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content<\/i><\/a><i> in the Proceedings of CHI\u201920, by Julia Cambre and Jessica Colnago from CMU, Jim Maddock from Northwestern, and Janice Tsai and Jofish Kaye from Mozilla.\u00a0<\/i><\/p>\n<p>In 2019, Mozilla\u2019s Voice team developed a method to evaluate the quality of text-to-speech voices. It turns out there was very little that had been done in the world of text to speech to evaluate voice for listening to <i>long<\/i>-form content \u2014 things like articles, book chapters, or blog posts. A lot of the existing work answered the core question of<i> &#8220;<\/i>can you <i>understand<\/i> this voice?\u201d So<a href=\"http:\/\/tts.speech.cs.cmu.edu\/courses\/11492\/slides\/tts_eval.pdf\"> a typical test<\/a> might use a syntactically correct but meaningless sentence, like \u201cThe masterly serials withdrew the collaborative brochure\u201d, and have a listener type that in. That way, the listener couldn&#8217;t guess missed words from other words in the sentence. But now that we\u2019ve reached a stage of computerized voice quality where so many voices can pass the comprehension test with flying colours, what\u2019s the next step?<\/p>\n<p>How can we determine if a voice is enjoyable to listen to, particularly for long-form content \u2014 something you\u2019d listen to for more than a minute or two? Our team had a lot of experience with this: we had worked closely with our colleagues at <a href=\"http:\/\/getpocket.com\">Pocket<\/a> to develop the <a href=\"https:\/\/help.getpocket.com\/article\/1081-listening-to-articles-in-pocket-with-text-to-speech\">Pocket Listen<\/a> feature, so you can listen to articles you\u2019ve saved, while driving or cooking. But we still didn\u2019t know how to definitively say that one voice led to a better listening experience than another.<\/p>\n<p>The method we used was developed by our intern <a href=\"https:\/\/jessica.colnago.org\/\">Jessica Colnago<\/a> during her internship at Mozilla, and it\u2019s pretty simple in concept. We took one article, <a href=\"https:\/\/hbr.org\/2013\/11\/reduce-your-stress-in-two-minutes-a-day\">How to Reduce Your Stress in Two Minutes a Day<\/a>, and we recorded each voice reading that article. Then we had 50 people on Mechanical Turk listen to each recording &#8212; 50 different people each time. (You can also<a href=\"https:\/\/ttschoice.github.io\/\"> listen to clips from most of these recordings<\/a> to make your own judgement.). Nobody heard the article more than once. And at the end of the article, we&#8217;d ask them a couple of questions to check they were actually listening, and to see what they thought about the voice.<\/p>\n<p>For example, we&#8217;d ask them to rate how much they liked the voice on a scale of one to five, and how willing they\u2019d be to listen to more content recorded by that voice. We asked them why they thought that voice might be pleasant or unpleasant to listen to. We evaluated 27 voices, and here\u2019s one graph which represents the results. (The <a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3313831.3376789\">paper<\/a> has lots more rigorous analysis, and we explored various methods to sort the ratings, but the end results are all pretty similar. We also added a few more voices after the paper was finished, which is why there\u2019s different numbers of voices in different places in this research.)<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12620\" src=\"https:\/\/blog.mozilla.org\/wp-content\/uploads\/2020\/05\/voice-comparison-graph-plain1-600x207.png\" alt=\"voice comparison graph\" width=\"826\" height=\"285\" \/> As you can see, some voices rated better than others. The ones at the left are the ones people consistently rated positively, and the ones at the right are the ones that people liked less: just as examples, you\u2019ll notice that the default (American)<a href=\"https:\/\/ttschoice.github.io\/\"> iOS female<\/a> voice is pretty far to the right, although the <a href=\"https:\/\/ttschoice.github.io\/\">Mac default<\/a> voice has a pretty respectable showing. I was proud to find that the <a href=\"https:\/\/ttschoice.github.io\/\">Mozilla Judy Wave1<\/a> voice, created by Mozilla research engineer <a href=\"https:\/\/github.com\/erogol\">Eren G\u00f6lge<\/a>, is rated up there along with some of the best ones in the field. It turns out the best electronic voices we tested are <a href=\"https:\/\/ttschoice.github.io\/\">Mozilla&#8217;s voices<\/a> and the <a href=\"https:\/\/ttschoice.github.io\/\">Polly Neural voices<\/a> from Amazon. And while we still have some licensing questions to figure out, making sure we can create sustainable, publicly accessible, high quality voices, it\u2019s exciting to see that we can do something in an open source way that is competitive with very well funded voice efforts out there, which don\u2019t have the same aim of being private, secure and accessible to all.<\/p>\n<p>We found there were some generalizable experiences. Listeners were 54% more likely to give a higher experience rating to the male voices we tested than the female voices. We also looked at the number of words spoken in a minute. Generally, our results indicated that there is a \u201cjust right speed\u201d in the range of 163 to 177 words per minute, and people didn\u2019t like listening to voices that were much faster or slower than that.<\/p>\n<p>But the more interesting result comes from one of the things we did at a pretty late stage in the process, which was to include some humans reading the article directly into a microphone. Those are the voices circled in red:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12621\" src=\"https:\/\/blog.mozilla.org\/wp-content\/uploads\/2020\/05\/voice-comparison-graph-humans-marked1-600x207.png\" alt=\"voice comparison graph humans marked\" width=\"689\" height=\"238\" \/><\/p>\n<p>What we found was that some of our human voices were being rated lower than some of the robot voices. And that&#8217;s fascinating. That suggests we are at a point in technology, in society right now, where there are mechanically generated voices that actually sound better than humans. And before you ask, I listened to those recordings of human voices. <a href=\"https:\/\/ttschoice.github.io\/\">You can do the same<\/a>. Janice (the recording labelled Human 2 <a href=\"https:\/\/ttschoice.github.io\/\">in the dataset<\/a>) has a perfectly normal voice that I find pleasant to listen to. And<i> yet<\/i> some people were finding these mechanically generated voices better.<\/p>\n<p>That raises a whole host of interesting questions, concerns and opportunities. This is a snapshot of computerized voices, in the last two years or so. Even since we\u2019ve done this study, we\u2019ve seen the quality of voices improve. What happens when computers are more pleasant to listen to than our own voices? What happens when our children might prefer to listen to our computer reading a story than ourselves?<\/p>\n<p>A potentially bigger ethical question comes with the question of persuasion. One question we didn\u2019t ask in this study was whether people trusted or believed the content that was read to them. What happens when we can increase the number of people who believe something simply by changing the voice that it is read in? There are <a href=\"https:\/\/wpcarey.asu.edu\/marketing-degrees\/research-lab\">entire careers<\/a> exploring the boundaries of influence and persuasion; how does easy access to \u201ctrustable\u201d voices change our understanding of what signals point to trustworthiness? The BBC has been exploring <a href=\"https:\/\/www.bbc.co.uk\/rd\/blog\/2020-02-synthetic-voices-accent-artificial-interactive\">British attitudes to regional accents<\/a> in a similar way \u2014 drawing, fascinatingly, from a study of how British people reacted to different voices <a href=\"https:\/\/genome.ch.bbc.co.uk\/page\/057fcba119844489bd09f557842ff901?page=5\">on the radio in 1927<\/a>. We are clearly continuing a long tradition of analyzing the impact of voice and voices on how we understand and feel about information.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog post is to accompany the publication of the paper Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content in the Proceedings of CHI\u201920, by Julia Cambre and Jessica Colnago from CMU, Jim Maddock from Northwestern, and Janice Tsai and Jofish Kaye from Mozilla.\u00a0 In 2019, Mozilla\u2019s Voice team developed [&hellip;]<\/p>\n","protected":false},"author":1421,"featured_media":12617,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"coauthors":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Mozilla research shows some machine voices score higher than humans<\/title>\n<meta name=\"description\" content=\"Research conducted at Mozilla shows that some human voices are rated by listeners lower than some robot voices. And that&#039;s fascinating. That suggests we are at a point in technology, in society right now, where there are mechanically generated voices that actually sound better than humans.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/\",\"url\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/\",\"name\":\"Mozilla research shows some machine voices score higher than humans\",\"isPartOf\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2020\/05\/voice-comparison-graph-plain.png\",\"datePublished\":\"2020-05-07T00:00:00+00:00\",\"dateModified\":\"2021-02-08T20:14:45+00:00\",\"author\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/860c2279205ab98ed4f99ab0fdbf9ed0\"},\"description\":\"Research conducted at Mozilla shows that some human voices are rated by listeners lower than some robot voices. And that's fascinating. That suggests we are at a point in technology, in society right now, where there are mechanically generated voices that actually sound better than humans.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#primaryimage\",\"url\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2020\/05\/voice-comparison-graph-plain.png\",\"contentUrl\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2020\/05\/voice-comparison-graph-plain.png\",\"width\":640,\"height\":360},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.mozilla.org\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Mozilla research shows some machine voices score higher than humans\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#website\",\"url\":\"https:\/\/blog.mozilla.org\/en\/\",\"name\":\"The Mozilla Blog\",\"description\":\"News and Updates about Mozilla\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.mozilla.org\/en\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/860c2279205ab98ed4f99ab0fdbf9ed0\",\"name\":\"Jofish Kaye\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/image\/167cd340fa731a02c1db7662f6531e35\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/d496571b2fc8245506fb6d9000f727db?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/d496571b2fc8245506fb6d9000f727db?s=96&d=mm&r=g\",\"caption\":\"Jofish Kaye\"},\"url\":\"https:\/\/blog.mozilla.org\/en\/author\/jkayemozilla-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Mozilla research shows some machine voices score higher than humans","description":"Research conducted at Mozilla shows that some human voices are rated by listeners lower than some robot voices. And that's fascinating. That suggests we are at a point in technology, in society right now, where there are mechanically generated voices that actually sound better than humans.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/","url":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/","name":"Mozilla research shows some machine voices score higher than humans","isPartOf":{"@id":"https:\/\/blog.mozilla.org\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#primaryimage"},"image":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2020\/05\/voice-comparison-graph-plain.png","datePublished":"2020-05-07T00:00:00+00:00","dateModified":"2021-02-08T20:14:45+00:00","author":{"@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/860c2279205ab98ed4f99ab0fdbf9ed0"},"description":"Research conducted at Mozilla shows that some human voices are rated by listeners lower than some robot voices. And that's fascinating. That suggests we are at a point in technology, in society right now, where there are mechanically generated voices that actually sound better than humans.","breadcrumb":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#primaryimage","url":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2020\/05\/voice-comparison-graph-plain.png","contentUrl":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2020\/05\/voice-comparison-graph-plain.png","width":640,"height":360},{"@type":"BreadcrumbList","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/mozilla-research-shows-some-machine-voices-score-higher-than-humans\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.mozilla.org\/en\/"},{"@type":"ListItem","position":2,"name":"Mozilla research shows some machine voices score higher than humans"}]},{"@type":"WebSite","@id":"https:\/\/blog.mozilla.org\/en\/#website","url":"https:\/\/blog.mozilla.org\/en\/","name":"The Mozilla Blog","description":"News and Updates about Mozilla","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.mozilla.org\/en\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/860c2279205ab98ed4f99ab0fdbf9ed0","name":"Jofish Kaye","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/image\/167cd340fa731a02c1db7662f6531e35","url":"https:\/\/secure.gravatar.com\/avatar\/d496571b2fc8245506fb6d9000f727db?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d496571b2fc8245506fb6d9000f727db?s=96&d=mm&r=g","caption":"Jofish Kaye"},"url":"https:\/\/blog.mozilla.org\/en\/author\/jkayemozilla-com\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts\/62603"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/users\/1421"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/comments?post=62603"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts\/62603\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/media\/12617"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/media?parent=62603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/categories?post=62603"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/tags?post=62603"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/coauthors?post=62603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}