VoxForge
Re: Librivox contributions and dates/numbers
If my lexicon contains SAY s eh, ONE w uh n, WON w uh n, TO t uw, TWO t uw, and I try to use prompts SAY ONE, SAY WON, SAY TO, SAY TWO, I find it hard to imagine that even given infinite amount of data a recognizer would ever be able to do better than 50-50 on SAY ONE. Perhaps I am wrong here?
The acoustic model doesn't care about the words, just the phones, so it is trained on /s eh w uh n/. The langage model says that given the recognised phones /s eh w uh n/ you output SAY ONE.
The topic of the thread was really different pronuciatiations for words, so a better example might have been "SAY 101" which could be pronounced as "SAY ONE OH ONE", "SAY ONE HUNDRED AND ONE", "SAY ONE HUNDRED ONE". In general it is better to allow all reseasonable pronuciations and let the Viterbi alignment pick the best.
Tony
--
Dr Tony Robinson, Founder Cantab Research Ltd
--- (Edited on 20-May-2012 5:50 pm [GMT+0100] by TonyR) ---