Click the 'Add' link to add a comment to this page.
From Human Interact's website:
Commander is a virtual reality title driven by human speech. The
audience is given agency in the middle of a sci-fi story, as part of a
military embroiled in a dark intergalactic war. You’re in command of a
secretive mission, and your decisions have deadly consequences.
The site has a very cool trailer, showing how you can command your very own ship like Captain Kirk.
Speech recognition and understanding is supplied by Microsoft's Custom Speech Service, a new speech service that lets you create customized acoustic and language models... Looks like Open Source has had it right all along, because we've always known that customized acoustic and language models work best for users...
In Uganda they are listening to the radio with speech recognition to analyze local issues. They listen to broadcasts in local English pronunciation and native languages.
From the article "We are focussing on the open source software HTK as a platform for [the speech recognition component]"
Mozilla has pivoted Vaani to be the Voice of IOT. Vaani was originally an "on-device" virtual assistant for FirefoxOS. Now they have 3 new projects related to creating a virtual assistant for the Internet of Things:
DeepSpeech: an open source speech recognition engine. It is based off of Baidu’s research and which will use Google's TensorFlow machine learning framework. It’s currently in early development.
Pipsqueak: a longer term goal to create a new speech recognition engine that implements cutting edge technology to allow Vaani to work
completely off-line while still allowing for the high quality speech
recognition users have become used to.
Murmur: a simple webapp for collecting speech samples to train speech
recognition engines. They want to slowly build a speech corpus to
train their open source models.
One thing to note, is that although they want to create their own speech corpus, for now they are planning to use a purchased speech corpus for their acoustic models.
The Mycroft AI, Inc. has released an open source platform called Mycroft core that promises to allow users to "use natural language to control the Internet of Things". the Mycroft framework also includes an intent parser called adapt and a TTS engine (based on CMU's Flite) called mimic. For speech recognition they are currently using Google's cloud-based speech recognition service.
They've also created a reference hardware implementation based on Raspberry Pi and Arduino and have had successful kickstarter and indiegogo campaigns to raise funds.
Since their stated goal is to provide an open source alternative to the likes of Amazon echo, the Mycroft AI group has started a new initiative called OpenSST (an Open Source Speech To Text project) looking to create "open source speech-to-text models"... likely for Kaldi.
In Toronto Star article http://www.thestar.com/news/world/2016/01/27/deepmind-computer-program-beats-humans-at-go.html
"As Hassabis told reporters, the same principles AlphaGo uses have many
applications, from better digital personal assistants to improved
medical diagnostics and far, far beyond. Because the algorithm is
general-purpose, it could respond nimbly to complex information like
voice instructions, for example. "
From general reading, it looks like AlphaGO uses two neural networks working together to prune the search space and evaluate the next move.
New speech recognition cloud services:
HP: IDOL (Intelligent Data Operating Layer) Speech Recognition API
Amazon: Alexa Voice Service (AVS)
NTT Com: SkyWay
HPE HAVEN ondemand Speech Recognition
IBM Watson Dialog service. (github dialog tool)
Code-Q O is working on a Qt Speech Recognition API for Qt using Pocketsphinx. Source repository.
Looks like Mozilla is working on a speech recognition front end called vaani that will allow users to submitt speech in different languages directly from FireFox. This is amazing news for open source speech recognition.
Kelly Davis says that they will make the speech corpus and acoustic models available by the end of this year (2015).
MOVI (My Own Voice Interface) is an offline speech recognizer and voice synthesizer that adds voice control functionality to any Arduino project.
What is interesting is their approach to training the on-board acoustic model:
MOVI’s Arduino API sends the training sentences in textual form over the
serial connection to the shield. The shield phonetizes sentences using a
2GB dictionary. The phoneme sequences are used to create a temporal
model that assigns higher probabilities to phonemes sequences that
occurred in the trained sentences than to those that didn’t.
Given that they say they are using open source algorithms which they intend to provide when the shield is released, it will be interesting to see how they've implemented this.