VoxForge
Just a thought:
I was reading the information about speaker dependent and speaker independent models on:
http://www.voxforge.org/home/dev
and it occurred to me that people who want to train the model to better recognise their voices are prime donators. If an interface collects the necessary samples to train the model to an individual's voice, the hard part is already done and a large number would likely submit the samples if asked.
I realise that this isn't immediately useful, but in the future, the idea is that speech-recognition/desktop-control applications will be derived from this project. A person installing a speech-recognition program is likely to expect to spend a decent amount of time (10 minutes? 30?) training it to their voice. It would be worth keeping in mind that we want to collect the raw audio in a useful format and ask the user to submit that to Voxforge
--- (Edited on 4/10/2008 4:33 am [GMT-0500] by Luna-Tick) ---
Well, everything is correct. Moreover with a 30 minutes of user's speech it's better to adapt generic model to his voice and get perfect user-dependant model. Such service indeed is very perspective and we can do it right now. The only trouble is the processing resources for model's adaptation.
--- (Edited on 4/10/2008 11:54 pm [GMT-0500] by nsh) ---
Hi Luna-Tick,
Excellent point - this is kind of the "Holy Grail" (Monty Python's version) of what we are trying to do.
We need to give users something that they need (i.e. speech recognition), allow them to tailor it to their environment (adapt general acoustic models to make them work better with their voice), and if they want to give back to the community, allow them to easily upload their adaptation speech recordings to VoxForge, or any other similar community.
Ken
--- (Edited on 4/15/2008 1:06 pm [GMT-0400] by kmaclean) ---
person records dictation talk to wav, analyze using existing database, takes resulting text, fixes mistakes. Now they have a perfect match between voice and text, so send it in and improve the system.
Also, if the error rate is high, how about correcting the first few 100 words. Now you have perfect match, so submit that speech / text to database (wait for it to be incorporated) and re-analyze the reaming wav. The next 100 words should be better, correct, submit, repeat.
There is the matter of syncing wav to text. But I am still not sure how much of an issue that will be.
--- (Edited on 6/17/2008 9:38 pm [GMT-0500] by CarlFK) ---
I would be one of those self-interested trainers. I want an open source linux-able speech recoginition product. I am happy to contribute a bit of time on the website but the downloading of julius and such at first seemed daunting. I might get to it but right now I'm just going to read on the site.
What might be nice for us self-interested persons would be to have the option of adding our own text and recording in a similar web-based environment that is currently used http://www.voxforge.org/home/read
There might need to be a filter or vocabulary proofing function before the data is used but at least this way I know the words & combinations of words that I use will be accounted for.
Shouldn't be too hard to set up, but I can't do that part of it.
--- (Edited on 10/23/2008 2:57 pm [GMT-0500] by rwtobey) ---
Hi rwtobey,
Thanks for the submissions and the suggestion (which I added to trac: ticket 436)
>but at least this way I know the words & combinations of words that I use
>will be accounted for.
If you've submitted a reasonable number of prompts, with a good variety of phonemes in different contexts, your don't necessarily need a recording of the actual words you want for your environment. However, having an acoustic model trained with the actual words you want recognized certainly improves the chances that they will be recognized correctly...
Ken
--- (Edited on 10/27/2008 10:49 pm [GMT-0400] by kmaclean) ---