{"id":1987,"date":"2018-06-07T09:21:52","date_gmt":"2018-06-07T08:21:52","guid":{"rendered":"http:\/\/blog.mozilla.org\/press-uk\/?p=1987"},"modified":"2018-06-07T13:30:00","modified_gmt":"2018-06-07T12:30:00","slug":"more-common-voices","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/press-uk\/2018\/06\/07\/more-common-voices\/","title":{"rendered":"More Common Voices"},"content":{"rendered":"<p><em>Today we are excited to announce that <\/em><a href=\"http:\/\/voice.mozilla.org\/\"><em>Common Voice<\/em><\/a><em>, Mozilla\u2019s initiative to crowdsource a large dataset of human voices for use in speech technology, is going multilingual! Thanks to the tremendous efforts from Mozilla\u2019s communities and our deeply engaged language partners you can now donate your voice in German, French and Welsh, and we are working to <\/em><a href=\"https:\/\/voice.mozilla.org\/languages\"><em>launch 40+ more<\/em><\/a><em> as we speak.\u00a0But this is just the beginning. We want Common Voice to be a tool for any community to make speech technology available in their own language.<\/em><\/p>\n<p><a href=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/Common-voice-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-1988\" src=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/Common-voice-1.png\" alt=\"\" width=\"1600\" height=\"356\" srcset=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/Common-voice-1.png 1600w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/Common-voice-1-300x67.png 300w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/Common-voice-1-768x171.png 768w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/Common-voice-1-600x134.png 600w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/Common-voice-1-1000x223.png 1000w\" sizes=\"(max-width: 1600px) 100vw, 1600px\" \/><\/a><\/p>\n<p>Since we launched Common Voice last July, we have collected hundreds of thousands of voice samples in English through our <a href=\"http:\/\/voice.mozilla.org\/\">website<\/a> and <a href=\"https:\/\/itunes.apple.com\/us\/app\/project-common-voice-by-mozilla\/id1240588326?mt=8\">iOS app<\/a>. Last November, we <a href=\"https:\/\/medium.com\/mozilla-open-innovation\/sharing-our-common-voice-mozilla-releases-second-largest-public-voice-data-set-e88f7d6b7666\">published the first version of the Common Voice dataset<\/a>. This data has been downloaded thousands of times, and we have seen the data being used in <a href=\"https:\/\/mycroft.ai\/blog\/mycroft-speech-to-text-and-balance\/\">commercial voice products<\/a> as well as open-source software like <a href=\"https:\/\/github.com\/kaldi-asr\/kaldi\">Kaldi<\/a> and our very own speech recognition engine, project <a href=\"https:\/\/github.com\/mozilla\/deepspeech\">Deep Speech<\/a>.<\/p>\n<p>Up until now, Common Voice has only been available for voice contributions in English. But the goal of Common Voice has always been to support many languages so that we may fulfil our vision of making speech technology more open, accessible, and inclusive for everyone. That is why our main effort these last few months has been around growing and empowering individual language communities to launch Common Voice in their parts of the world, in their local languages and dialects.<\/p>\n<p>In addition to localising the website, these communities are populating Common Voice with copyright-free sentences for people to read that have the required characteristics for a high quality dataset. They are also helping promote the site in their countries, building a community of contributors with the goal of growing the total hours of collected data available in each language.<\/p>\n<p><span style=\"text-decoration: line-through;\"><a href=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/common-voice-2.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-1989\" src=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/common-voice-2.jpg\" alt=\"\" width=\"1200\" height=\"800\" srcset=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/common-voice-2.jpg 1200w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/common-voice-2-300x200.jpg 300w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/common-voice-2-768x512.jpg 768w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/common-voice-2-600x400.jpg 600w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/common-voice-2-1000x667.jpg 1000w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/a><\/span><\/p>\n<p>Adding to English, we are now collecting voice samples in French, German and Welsh. And there are already more than 40 other languages on the way &#8211; not only big languages like Spanish, Chinese or Russian, but also smaller ones like Frysian, Norwegian or Chuvash. For us, these smaller languages are important because they are often under-served by existing commercial digital and speech recognition services. Having data available can empower entrepreneurs and communities to address this gap on their own.<\/p>\n<p>Going multilingual marks a big step for Common Voice and we hope that it\u2019s also a big step for speech technology in general. Democratizing voice technology will not only lower the barrier for global innovation, but also the barrier for access to information. Especially so for people who traditionally have had less of this access &#8212; for example, vision impaired, people who never learned to read, children, the elderly and many others.<\/p>\n<p>We are thrilled to see the growing support we are getting to build the world\u2019s largest public multi-language voice dataset and everyone can help us grow it by <a href=\"https:\/\/voice.mozilla.org\/record\">donating your voice<\/a>. You can also use the <a href=\"https:\/\/itunes.apple.com\/us\/app\/project-common-voice-by-mozilla\/id1240588326\">iOS app<\/a>. If you would like to help bring Common Voice and speech technology to your language, visit our <a href=\"https:\/\/voice.mozilla.org\/languages\">language page<\/a>. And if you are part of an organisation and have an idea for participating in this project, please get in touch.<\/p>\n<p>Our <a href=\"https:\/\/discourse.mozilla.org\/c\/voice\">Forum<\/a> gives more details on how to help, as well as being a great place to ask questions and meet the communities.<\/p>\n<p><strong>Special Thanks<\/strong><\/p>\n<p>We would like to thank our Speech Advisory Group, people who have been expert advisors and contributors to the Common Voice project:<\/p>\n<ul>\n<li>Francis Tyers &#8211; Assistant Professor of Computational Linguistics at Higher School of Economics in Moscow.<\/li>\n<li>Gilles Adda &#8211; Speech scientist<\/li>\n<li>Thomas Griffiths &#8211; Digital Services Officer, Office of the Legislative Assembly, Australia<\/li>\n<li>Joshua Meyer &#8211; PhD candidate in Speech Recognition<\/li>\n<li>Delyth Prys &#8211; Language technologies at Bangor University research centre.<\/li>\n<li>Dewi Bryn Jones &#8211; Language technologies at Bangor University research centre.<\/li>\n<li>Wael Farhan &#8211; MS in Machine Learning from UCSD, currently doing research for Arabic NLP at Mawdoo3.com.<\/li>\n<li>Eren G\u00f6lge &#8211; Machine learning scientist currently working on TTS for Mozilla.<\/li>\n<li>Alaa Saade &#8211; Senior Machine Learning Scientist @ Snips (Paris)<\/li>\n<li>Laurent Besacier &#8211; Professor at Universit\u00e9 Grenoble Alpes, NLP, speech processing, low resource languages<\/li>\n<li>David van Leeuwen &#8211; Speech Technologist<\/li>\n<li>Benjamin Milde &#8211; PhD candidate in NLP\/speech processing<\/li>\n<li>Shay Palachy &#8211; M.Sc. in Computer Science, Lead Data Scientist in a startup<\/li>\n<\/ul>\n<p>***<\/p>\n<p><em>Common Voice complements Mozilla&#8217;s work in the field of speech recognition, which runs under the project name <strong>&#8220;<\/strong><\/em><a href=\"https:\/\/github.com\/mozilla\/DeepSpeech\"><strong><em>Deep Speech<\/em><\/strong><\/a><strong><em>&#8220;<\/em><\/strong><em>, an open-source speech recognition engine model that approaches human accuracy, which was released in November 2017. Together with the growing Common Voice dataset we believe this technology can and will enable a wave of innovative products and services, and that it should be available to everyone.<\/em><\/p>\n<p>&#8211;<\/p>\n<p>&nbsp;<\/p>\n<h3><em>More Common Voices (Welsh)<\/em><\/h3>\n<p>&nbsp;<\/p>\n<h2 id=\"Cymraeg\"><strong>Rhagor o Leisiau i Common Voice<\/strong><\/h2>\n<p><em>Heddiw mae\u2019n bleser gennym gyhoeddi fod <\/em><a href=\"http:\/\/voice.mozilla.org\/\"><em>Common Voice<\/em><\/a><em>, menter Mozilla i dorfoli set ddata fawr o leisiau dynol ar gyfer eu defnyddio mewn technoleg lleferydd, yn mynd yn mynd i fod ar gael ar gyfer nier o ieithoedd! Diolch i ymdrechion glew cymunedau lleoleiddio Mozilla a\u2019n partneriaid iaith ymrwymedig gallwch nawr gyfrannu eich llais mewn Cymraeg, Almaeneg a Ffrangeg, ac rydym yn gweithio i <\/em><a href=\"https:\/\/voice.mozilla.org\/languages\"><em>lansio 40+ yn ychwanegol<\/em><\/a><em> cyn bo hir.\u00a0Ond dim ond y dechrau yw hyn. Rydym eisiau i Common Voice fod yn arf ar gyfer unrhyw gymuned i greu technoleg lleferydd yn eu hiaith eu hun.<\/em><\/p>\n<p><a href=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-1991\" src=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh.png\" alt=\"\" width=\"1600\" height=\"347\" srcset=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh.png 1600w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-300x65.png 300w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-768x167.png 768w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-600x130.png 600w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-1000x217.png 1000w\" sizes=\"(max-width: 1600px) 100vw, 1600px\" \/><\/a><\/p>\n<p>Ers i ni lansio Common Voice fis Gorffennaf diwethaf, rydym wedi casglu cannoedd o filoedd o samplau llais yn Saesneg drwy ein <u>gwefan<\/u> ac <a href=\"https:\/\/itunes.apple.com\/us\/app\/project-common-voice-by-mozilla\/id1240588326?mt=8\">ap iOS<\/a>. Fis Tachwedd y llynedd, fe wnaethom ni <a href=\"https:\/\/medium.com\/mozilla-open-innovation\/sharing-our-common-voice-mozilla-releases-second-largest-public-voice-data-set-e88f7d6b7666\">gyhoeddi fersiwn cyntaf set ddata Common Voice<\/a>. Mae\u2019r \u00a0data hyn wedi cael eu llwytho lawr fileodd o weithiau, ac rydym wedi gweld y data yn cael eu defnyddio mewn <a href=\"https:\/\/mycroft.ai\/blog\/mycroft-speech-to-text-and-balance\/\">cynnyrch llais masnachol<\/a> fel <a href=\"https:\/\/github.com\/kaldi-asr\/kaldi\">Kaldi<\/a> yn ogystal \u00e2\u2019n meddalwedd cod agored ni ein hunain, <a href=\"https:\/\/github.com\/mozilla\/deepspeech\">Deep Speech<\/a>.<\/p>\n<p>Hyd yn hyn, mae Common Voice wedi bod ar gael dim ond ar gyfer cyfraniadau llais yn Saesneg. Ond nod Common Voice o\u2019r dechrau oedd cefnogi llawer o ieithoedd er mwyn gwireddu ein gweledigaeth o wneud technoleg lleferydd yn fwy agored, hygyrch a chynhwysol i bawb. Dyna pam fod ein prif ymdrechion yn ystod y misoedd diwethaf wedi canolbwyntio ar dyfu a grymuso cymunedau iaith unigol i lansio Common Voice yn eu rhannau nhw o\u2019r byd, yn eu hieithoedd a\u2019u tafodieithoedd eu hunain.<\/p>\n<p>Yn ychwanegol at leoleiddio\u2019r wefan, mae\u2019r cymunedau hyn yn helpu poblogi Common Voice gyda brawddegau i bobl eu darllen. Mae\u2019r brawddegau hyn yn rhydd o hawlfraint, ac mae ganddyn nhw\u2019r nodweddion cywir i greu set ddata o safon uchel. Mae\u2019r cymunedau hyn hefyd yn helpu hyrwyddo\u2019r wefan yn eu gwledydd eu hunain, gan adeiladu cymuned o gyfranwyr gyda\u2019r nod o dyfu\u2019r cyfanswm o oriau o ddata sydd wedi\u2019u casglu ac sydd ar gael ym mhob iaith.<\/p>\n<p><a href=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-1.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-1992\" src=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-1.jpg\" alt=\"\" width=\"1200\" height=\"800\" srcset=\"https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-1.jpg 1200w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-1-300x200.jpg 300w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-1-768x512.jpg 768w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-1-600x400.jpg 600w, https:\/\/blog.mozilla.org\/press-uk\/files\/2018\/06\/welsh-1-1000x667.jpg 1000w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/a><\/p>\n<p>Yn ychwanegol at Saesneg, rydym nawr yn casglu samplau llais mewn Cymraeg, Ffrangeg ac Almaeneg. Ac mae mwy na 40 iaith arall eisoes ar y ffordd &#8211; nid dim ond ieithoedd mawr fel Sbaeneg, Tseineeg neu Rwsieg, ond hefyd ieithoedd llai fel Ffriseg, Norwyeg neu Chuvash. I ni, mae\u2019r ieithoedd llai hyn yn bwysig oherwydd eu bod nhw yn aml heb gael digon o sylw gan y gwasanaethau adnabod lleferydd a digidol masnachol presennol. Gall bodolaeth data addas hefyd rymuso entrepreneuriaid a chymunedau i lenwi\u2019r bwlch hwn eu hunain.<\/p>\n<p>Mae mynd yn amlieithog yn gam mawr i Common Voice a gobeithiwn ei fod hefyd yn gam mawr i dechnoleg lleferydd yn gyffredinol. Bydd democrateiddio technoleg lleferydd nid yn unig yn lleihau\u2019r rhwystr sy\u2019n atal arloesedd byd-eang ond hefyd y rhwystr sy\u2019n atal pobl rhag cael mynediad at y wybodaeth. Yn arbennig felly pobl sydd yn draddodiadol wedi cael llai o fynediad &#8212; er enghraifft, pobl \u00e2 nam ar eu golwg, pobl na wnaeth erioed ddysgu darllen, plant, pobl h\u0177n, a llawer eraill.<\/p>\n<p>Rydym yn falch iawn o weld y gefnogaeth gynyddol sydd i ni adeiladu\u2019r set ddata amlieithog gyhoeddus fwyaf yn y byd, a gall pawb ein helpu i\u2019w dyfu drwy <a href=\"https:\/\/voice.mozilla.org\/record\">gyfrannu eich llais<\/a>. Os hoffech chi helpu i ddod \u00e2 Common Voice a thechnoleg lleferydd i\u2019ch iaith chi, ewch i\u2019n <a href=\"https:\/\/voice.mozilla.org\/languages\">tudalen iaith<\/a>. Ac os ydych yn rhan o sefydliad a bod gennych syniad ar gyfer cymryd rhan yn y project hwn, cysylltwch \u00e2 ni.<\/p>\n<p>Mae ein <a href=\"https:\/\/discourse.mozilla.org\/c\/voice\">Fforwm<\/a> yn rhoi mwy o fanylion ar sut i helpu, yn ogystal \u00e2 bod yn lle gwych i ofyn cwestiynau a chyfarfod \u00e2\u2019r cymunedau.<\/p>\n<p><strong>Diolch Arbennig<\/strong><\/p>\n<p>Hoffem ddiolch i\u2019n Gr\u0175p Ymgynghorol Lleferydd, pobl sydd wedi bod yn gyfranwyr ac yn ymgynghowyr arbenigol i\u2019r project Common Voice:<\/p>\n<ul>\n<li>Francis Tyers &#8211; Athro Cynorthwyol Ieithyddiaeth Gyfrifiadurol yn yr Ysgol Economeg Uwch yn Moscow.<\/li>\n<li>Gilles Adda &#8211; Gwyddonydd lleferydd<\/li>\n<li>Thomas Griffiths &#8211; Swyddog Gwasanaethau Digidol, Swyddfa\u2019r Cynulliad Deddfwriaethol, Awstralia<\/li>\n<li>Joshua Meyer &#8211; ymgeisydd PhD mewn Adnabod Lleferydd<\/li>\n<li>Delyth Prys &#8211; Pennaeth Uned Technolegau Iaith, Prifysgol Bangor, Cymru<\/li>\n<li>Dewi Bryn Jones &#8211; Prif Beiriannydd Meddalwedd, Uned Technolegau Iaith, Prifysgol Bangor, Cymru<\/li>\n<li>Wael Farhan &#8211; MS mewn Dysgu Peiriant o UCSD, ar hyn o bryd yn gwneud ymchwil ar gyfer NLP Arabeg yn Mawdoo3.com.<\/li>\n<li>Eren G\u00f6lge &#8211; Gwyddonydd dysgu peirianyddol sydd ar hyn o bryd yn gweithio ar destun i leferydd i Mozilla.<\/li>\n<li>Alaa Saade &#8211; Uwch Wyddonydd Dysgu Peirianyddol yn Snips (Paris)<\/li>\n<li>Laurent Besacier &#8211; Athro yn Universit\u00e9 Grenoble Alpes, NLP, prosesu lleferydd, adnoddau llai eu hadnoddau<\/li>\n<li>David van Leeuwen &#8211; Technolegydd Lleferydd<\/li>\n<li>Benjamin Milde &#8211; ymgeisydd PhD mewn NLP\/prosesu lleferydd<\/li>\n<li>Shay Palachy &#8211; M.Sc. mewn Cyfrifiadureg, Gwyddonydd Data Arweiniol mewn cwmni cychwynnol<\/li>\n<\/ul>\n<p>***<\/p>\n<p><em>Mae Common Voice yn cefnogi gwaith Mozilla ym maes adnabod lleferydd, sy\u2019n rhedeg dan yr enw project <strong>&#8220;<\/strong><\/em><a href=\"https:\/\/github.com\/mozilla\/DeepSpeech\"><strong><em>Deep Speech<\/em><\/strong><\/a><strong><em>&#8220;<\/em><\/strong><em>, model peiriant adnabod lleferydd cod agored sy\u2019n dod yn agos at gywirdeb dynol, a ryddhawyd ym mis Tachwedd 2017. Ar y cyd gyda\u2019r set ddata Common Voice rydym yn credu y gall ac y bydd y dechnoleg hon yn galluogi ton o gynnyrch a gwasanaethau arloesol, ac y dylai hyn fod ar gael i bawb.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today we are excited to announce that Common Voice, Mozilla\u2019s initiative to crowdsource a large dataset of human voices for use in speech technology, is going multilingual! Thanks to the &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/press-uk\/2018\/06\/07\/more-common-voices\/\">Read more<\/a><\/p>\n","protected":false},"author":493,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[121],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/posts\/1987"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/users\/493"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/comments?post=1987"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/posts\/1987\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/media?parent=1987"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/categories?post=1987"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/press-uk\/wp-json\/wp\/v2\/tags?post=1987"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}