VoxForge
Hi again,
Another thing, thinking about voice donor process, I have one idea... let's see if you think it could be viable.
In english there are some programs that do voice recognition, but in spanish there is NO program that can do anything. (This happens in a lot of languages).
If a possible voice donor, see the number of voice records that we have in any language like spanish, it seems that it's very dificult to reach the first 140 hours objetive.
Then a person, like me, could forgot the donor, or he could think about preparing training data for sphinx2 or sphinx3 or julian, in order to have basic functions of recognition for his own voice... the process of creating this it's not easy.
Could it be possible to have a application in voxforge that help the people in all this training process. Let's say that in the initial process you have the read applet. Then a person could record his voice (100 phrases could be about 10 minutes of voice), and voxforge will retain all this voice under the GNU license. Then let's say that Voxforge processes the voice of one person and prepare with HTK, the HMM files that this person could use it in order to use any of the voice recognition engines (the selected by voxforge). This could be a little gift that voxforge gives to a voice donor... and it could help convincing a person to donor his voice.
If this process is something that could be automatically done, it could be possible to write some messages in some linux "freak" forums, explaining the easy method of having voice recognition for their desktop...
What do you thing about this?!? could it be a way in order to persuade to donor his voice.
Regards.
--- (Edited on 10/29/2008 5:19 pm [GMT-0500] by ubanov) ---
(This discussion originally started this thread)
Hi Ubanov,
>... it seems that it's very dificult to reach the primer 160 hours objetive.
It's actually 140 hours, but still a high number...(Note that this is only for a command and control system).
> Then a person could record his voice (100 phrases could be about 10
>minutes of voice) [...] Then let's say that Voxforge processes the voice of
>one person and prepare with HTK, the HMM files that this person could
>use it in order to use any of the voice recognition engines (the selected
>by voxforge). This could be a little gift that voxforge gives to a voice
>donor... and it could help convincing a person to donor his voice.
Interesting, though I am not sure I understand the advantage of creating speaker dependent acoustic models with only a single person's speech... You would need more speech than 10 minutes to create a good user-specific acoustic model.
Allowing people to "adapt" the generic VoxForge acoustic model to their voice would probably make more sense...
In addition, since people are continuously adding speech to the English VoxForge corpus, and acoustic models are re-trained every night, users are getting acoustic models that are trained with their voice... whenever I get around to validating and submitting their submission to the Subversion Speech Corpus repository.
This validation of submitted audio is the current bottleneck that I am working to resolve... We need a web-enabled way to allow users to validate the audio (and correct the prompts, and readme entries...) in submissions and mark them for inclusion (or exclusion) in the nightly build of the acoustic model.
thanks,
Ken
--- (Edited on 10/30/2008 12:15 pm [GMT-0400] by kmaclean) ---
Hi again,
I thing that will be better to adapt the generic VoxForge acoustic model to their voice, but today there is nothing to adapt, because we don't have enough voice. Then the only way to make a simple voice recognition program in spanish is to make a speaker dependent acoustic model, isn't it?
I don't know how many hours are needed to make a simple command recognition program...
In order to the people to use the recognition, it's needed the existence of some voice aplication that could control a linux desktop... make a simple program could be easy (a program that handle a little grammar: open program, close program, and so on).
[...]
Regards.
--- (Edited on 10/30/2008 12:17 pm [GMT-0400] by kmaclean) ---
(from buhochileno)
Hi:
Spanish voice recognition for command+control apps are posible with gnome-voice-control applet (open/close programs, move mouset, etc). I work with Nickolay at #cmusphinx IRC channel to make a spanish version of the applet (I help with some traslation and with a basic spanish+phoneme dictionary), it was tricky but possible..currently I have it installed in my fedora system and it able to recognize simple command likes mouse up/down, open terminal/gedit and stuff like that, but actually don't do anything after the command is recognized, but that is a gnome accesibility problem that the sphinx guys have (not to serious they are working to make it work again...previous version do the thing...)...I think that Nickolay use some of the Dr. Nolasco resources (he is from CMU to) or have some programs to automate some tasks...
Let me know if you are interesting and I can try to find the time to find the comand/config that I make ...
Some month ago I also work to help to voxforge in some traslation but I don't have to much time now, but is a very interesting subject to me still, so I can try to find some time...also you are moving to much fast that I'm...
Mauricio
P.D: hi Ken!!, seems like the spanish is making progress :-)
--- (Edited on 10/30/2008 12:19 pm [GMT-0400] by kmaclean) ---
Hi Ubanov,
>Then the only way to make a simple voice recognition program in spanish
>is to make a speaker dependent acoustic model, isn't it?
Not necessarily... collecting speech the way we are currently doing it at VoxForge will achieve this goal also, it will just take time...
Regardless, you can still make a speaker dependent acoustic model yourself. Just follow the CMU Robust Group Sphinx Tutorial or the VoxForge HTK/Julius tutorial (and don't forget to donate your recorded speech to VoxForge).
In order to create a simple dialog manager (more dialog managers), buhochileno's post states you might use the Gnome-voice-control applet for a Sphinx acoustic model. You might also try Perlbox. For HTK/Julius look at RainCT blog post for how to create a simple dialog manager in Python.. so does Colin Beckingham's article on Linux.com. ou might also try Simon.
Hope that helps,
Ken
--- (Edited on 10/30/2008 12:53 pm [GMT-0400] by kmaclean) ---
Hi Ken,
For an English speaker that would like to work with voice recognition software, if he donnors his voice, and then the next day he downloads the nightly build of files that you generate (witch include info about his voice), then he could easily configure a Julius or Sphinx in order to recognice his voice. Is this correct?!?
Regards,
Ivan
--- (Edited on 11/3/2008 6:11 am [GMT-0600] by ubanov) ---
Yes, if the speech files are incorporated before that night. But you can see at http://voxforge.org/home/listen it takes longer, so you problably have to wait a month.
--- (Edited on 11/3/2008 4:27 pm [GMT-0600] by Visitor) ---
Hi Ivan,
>if he donnors his voice, and then the next day he downloads
>the nightly build of files that you generate
Yes, that is supposed to be the process, but i have been slacking at getting this turned around quickly.
I have been working on a way to allow submissions to be posted to the VoxForge site immediately, and allow users to validate the submission on-line (i..e make sure the prompts match the recording, rate the quality, flag for inclusion/exclusion from nightly build, etc.).
Once that is done, and the bugs get ironed out on the English side, then we will be able to look at other languages (like Spanish...).
Ken
P.S. I am travelling now, so my replies might be a little more delayed...
--- (Edited on 11/3/2008 8:39 pm [GMT-0500] by kmaclean) ---
Hi Ubanov,
>Then my funtionality request is not really necessary... the correct way
>could be to have this process finished, isn't it?!?
Yes :)
Ken
--- (Edited on 11/16/2008 10:46 am [GMT-0500] by kmaclean) ---