Mozilla releases dataset and model to lower voice-recognition barriers

The browser maker has collected nearly 500 hours of speech to help voice-recognition projects get off the ground.



Mozilla has released its Common Voice collection, which contains almost 400,000 recordings from 20,000 people, and is claimed to be the second-largest voice dataset publicly available.
The voice samples in the collection were obtained from Mozilla's Common Voice project, which allowed users via an iOS app or website to donate their utterances. It is hoped that creating a large public dataset will allow for better voice-enabled applications.
"One reason so few services are commercially available is a lack of data," Mozilla senior vice president of emerging technologies Sean White said in a blog post.
"Startups, researchers, or anyone else who wants to build voice-enabled technologies need high-quality, transcribed voice data on which to train machine-learning algorithms. Right now, they can only access fairly limited data sets."
At the moment, the collection is focused on English, but there are plans to extend it to other languages in the first half of 2018.
Alongside its dataset, Mozilla also released its open-source Project DeepSpeech voice-recognition model based on work done by Chinese internet giant Baidu. It is claimed that with its 6.5 percent error rate on the LibriSpeech dataset, DeepSpeech is approaching human levels of recognition.
In August, Microsoft said it had reached a voice-recognition error rate of 5.1 percent on the Switchboard corpus, the same level as professional human transcribers.
Despite the new milestone, Microsoft acknowledges that machines still find it tough to recognise different accents and speaking styles, and don't perform well in noisy conditions.
Earlier in the year, Google said it had a 4.9 percent error rate in its speech-recognition software.
Samsung has said it is looking to use voice recognition throughout its home appliance line-up by 2020, and recently partnered with Kakao to cooperate on AI and voice recognition.

Comments

Popular Posts

Hacker steals data of millions of Bulgarians, emails it to local media

​Linux totally dominates supercomputers

Microsoft tries to stem its self-made collaboration-tool confusion