VoxForge
Re: Librivox contributions and dates/numbers
I maintain that a recognizer would have a problem distinguishing between SAY ONE and SAY WON since they have identical phoneme representations
This is the job of the langauge model that is used at run time and nothing to do with the acoustic model training.
So the master lexicon will need to contain a subset of 1066, TEN, SIXTY, SIX, THOUSAND, AND, ONE, TEN_SIXTY_SIX and so on. I'm just concerned that the 1066/TEN_SIXTY_SIX approach (which of course is perfectly valid) for numbers means a perpetually growing lexicon.
You can create a network that has branches for all the pronuciations of 1066 in terms of ordinary words and then force align on that. In this way you keep the pronunciaiton dictionary small. But even if you have tens of thousands of hours of data to align it's not too bad to expand out 1066 and all other troublesome cases many times in the dictionary used for alignment.
Tony
--
Dr Tony Robinson, Founder Cantab Research Ltd
--- (Edited on 20-May-2012 6:33 pm [GMT+0100] by TonyR) ---