VoxForge
Re: More on Collecting Speech Audio for Free GPL Speech Corpus
I feel like elaborating a bit on what I wrote earlier:
"I suspect ...it's not a good idea to include audio which has a SNR too low for dictation to be feasible"
The minimum feasible SNR that today's speech recognition software can take dictation at is a lot higher than for a human listener taking dictation.
But actually, the SNR that can be handled depends on the type of noise, because some noises are easier for the computer to deal with than others. Background speech is particularly hard.
Dictation is very challenging for the computer since there are so many possible sentences that the user might say. A small-vocabulary command-and-control application is easier for the computer so it can work at a lower SNR.
--- (Edited on 11/29/2007 6:00 pm [GMT-0600] by DavidGelbart) ---