Click here to register.

Speech Recognition in the News

Click the 'Add' link to add a comment to this page.

Note: You need to be logged in to add a comment!


MLCommons: People’s Speech Dataset
By kmaclean - 3/30/2021 - 1 Replies

From the MLCommons website:

The People’s Speech Dataset is the world’s largest labeled open speech dataset and includes 87,000+ hours of transcribed speech in 59 different languages with a diverse set of speakers. This open dataset is large enough to train speech-to-text systems and crucially will be available with a permissive license. Just as ImageNet catalyzed machine learning for vision, the People’s Speech will unleash innovation in speech research and products that are available to users across the globe.

Facebooks VoxPopuli Corpus
By kmaclean - 1/22/2021 - 1 Replies

Facebook to release huge multilingual corpus of unlabelled speech data (paper - pub date: Jan 2021):

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised representation learning as well as semi-supervised learning. VoxPopuli also contains 1.8K hours of transcribed speeches in 16 languages and their aligned oral interpretations into 5 other languages totaling 5.1K hours. We provide speech recognition baselines and validate the versatility of VoxPopuli unlabelled data in semi-supervised learning under challenging out-of-domain settings. We will release the corpus at under an open license.

Google launches Speech Commands Dataset
By kmaclean - 8/25/2017

Google has created a free and open dataset called the Speech Commands Dataset.  It is targeted to neural network beginners to allow them to build models for simple keyword detection.

from the Googleblog website:

The dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license, and will continue to grow in future releases as more contributions are received. The dataset is designed to let you build basic but useful voice interfaces for applications, with common words like “Yes”, “No”, digits...


Mozilla Project: Common Voice
By kmaclean - 6/28/2017

Mozilla has created a new project called "Common Voice" with the goal of collecting 10,000 hours of speech.  They have created a very nice web app to collect and validate submitted speech.  Their goal is to collect 10,000 of speech.  From their website:

What is Common Voice?

[...] Common Voice is a project to make voice recognition technology easily accessible to everyone. People donate their voices to a massive database that will let anyone quickly and easily train voice-enabled apps. All voice data will be available to developers.


When will the dataset be available?

Mozilla aims to begin to capture voices in June and release the open source database later in 2017.

All speech submitted will be released under the CC-0 license (public domain).  They are starting with English, and acknowledge the need to add more languages.

Starship Commander - Speech Recognition powered Virtual Reality game
By kmaclean - 2/10/2017

From Human Interact's website:

Human Interact's first game: Starship Commander

Starship Commander is a virtual reality title driven by human speech. The audience is given agency in the middle of a sci-fi story, as part of a military embroiled in a dark intergalactic war. You’re in command of a secretive mission, and your decisions have deadly consequences.

The site has a very cool trailer, showing how you can command your very own ship like Captain Kirk.

Speech recognition and understanding is supplied by Microsoft's Custom Speech Service, a new speech service that lets you create customized acoustic and language models... Looks like Open Source has had it right all along, because we've always known that customized acoustic and language models work best for users...

Decoding radio talk in Uganda
By colbec - 12/4/2016

In Uganda they are listening to the radio with speech recognition to analyze local issues. They listen to broadcasts in local English pronunciation and native languages.

From the article "We are focussing on the open source software HTK as a platform for [the speech recognition component]"

Apple jack ax ushers in a voice-driven world: Yahoo News
By colbec - 9/8/2016

Mozilla's Vaani Voice of IOT
By kmaclean - 6/28/2016

Mozilla has pivoted Vaani to be the Voice of IOT.   Vaani was originally an "on-device" virtual assistant for FirefoxOS.  Now they have 3 new projects related to creating a virtual assistant for the Internet of Things:

DeepSpeech: an open source speech recognition engine.  It is based off of Baidu’s research and which will use Google's TensorFlow machine learning framework.  It’s currently in early development.

Pipsqueak: a longer term goal to create a new speech recognition engine that implements cutting edge technology to allow Vaani to work completely off-line while still allowing for the high quality speech recognition users have become used to.

Murmur: a simple webapp for collecting speech samples to train speech recognition engines.  They want to slowly build a speech corpus to train their open source models.

One thing to note, is that although they want to create their own speech corpus, for now they are planning to use a purchased speech corpus for their acoustic models.

Mycroft AI and OpenSST
By kmaclean - 6/17/2016

The Mycroft AI, Inc. has released an open source platform called Mycroft core that promises to allow users to "use natural language to control the Internet of Things".  the Mycroft framework also includes an intent parser called adapt and a TTS engine (based on CMU's Flite) called mimic.  For speech recognition they are currently using Google's cloud-based speech recognition service.

They've also created a reference hardware implementation based on Raspberry Pi and Arduino and have had successful kickstarter and indiegogo campaigns to raise funds.

Since their stated goal is to provide an open source alternative to the likes of Amazon echo, the Mycroft AI group has started a new initiative called OpenSST (an Open Source Speech To Text project) looking to create "open source speech-to-text models"... likely for Kaldi.

AlphaGO and voice
By colbec - 1/28/2016

In Toronto Star article

"As Hassabis told reporters, the same principles AlphaGo uses have many applications, from better digital personal assistants to improved medical diagnostics and far, far beyond. Because the algorithm is general-purpose, it could respond nimbly to complex information like voice instructions, for example. "

From general reading, it looks like AlphaGO uses two neural networks working together to prune the search space and evaluate the next move.