VoxForge
Ahel asks:
Hi!
I'm attending to apply at this idea
related to Voxforge: briefly Peter (my mentor) has thought that
everyone who install simon, train for himself the vocabulary recognition
and so on; so it would be cool if simon would envoy semi-automatically
the trained audio file that Voxforge is collecting.
May I know what do you think about this idea?
Also I'd like to
discuss with you if and how ways would be better to implement, in your
opinion, the technology in support of this idea.
Thank you for your time.
Please feel free to contact me as soon as you wish.
--- (Edited on 4/15/2011 8:27 pm [GMT-0400] by kmaclean) ---
My reply:
Hi Ahel,
Hi!
I'm attending to apply at this idea related to Voxforge: briefly Peter (my mentor) has thought that everyone who install simon, train for himself the vocabulary recognition and so on; so it would be cool if simon would envoy semi-automatically the trained audio file that Voxforge is collecting.
May I know what do you think about this idea?
Also I'd like to discuss with you if and how ways would be better to implement, in your opinion, the technology in support of this idea.
--- (Edited on 4/15/2011 8:28 pm [GMT-0400] by kmaclean) ---
Reply from Ahel:
Hi Ken,
Thank you for your answer and I'm sorry to haven't answered till now.
My replies follow:
Hi Ahel,Glad to hear that you are interested in helping out with open source speech recognition!My replies follow:
On Tue, Apr 5, 2011 at 5:50 PM, Ahel ibn Alquivr <[email protected]> wrote:
Hi!
I'm attending to apply at this idea related to Voxforge: briefly Peter (my mentor) has thought that everyone who install simon, train for himself the vocabulary recognition and so on; so it would be cool if simon would envoy semi-automatically the trained audio file that Voxforge is collecting.
When you say "envoy" do you mean send or upload to the VoxForge collection site?
May I know what do you think about this idea?I think it is a great idea, especially for the languages for which we have fewer submissions (i.e. non-English languages). For English, it would be an excellent way to obtain more speech, and to improve the quality of submissions, since, as Peter mentions in one of the comment in the page you linked, Simon can help with filtering out poor recordings.
Some things to think about:
How much speech do we need?Peter also mentions in the GSOC project description that we need lots more speech to create acoustic models for dictation applications (unfortunately, I have helped spread that misconception...). Though more speech generally is better, there are diminishing returns after a point - and it is not clear where that point is. See the following links:
- Nickolay (CMU Sphinx maintainer) post in this thread: VoxForge Acoustic Models w/ Sphinx 4):
- Nickolay's blog post here: How to create a speech recognition application for your needs.
- Arthur Chan (former CMU Sphinx maintainer) paper here: Do we have a true open source dictation machine?:
- The CMU SPhinx site's speech corpus size estimates
I have guessed 140 hours of speech is required for a good command and control acoustic model - simply because that is what the main CMU Sphinx acoustic model uses...Not sure how much is required for dictation, though one would "assume" more would be required. However, as Nickolay clearly argues in his posts, and the papers he links to, more is not necessarily better.
Types of speechThere are many types of speech that can be collected - see this FAQ entry: What is a speech corpus or speech corpora? Command and control can use acoustic models with 'read' speech, and work quite well.
Dictation acoustic models *may* need to be trained with 'spontaneous speech' rather than 'read' speech (or a mix thereof) to be effective. See this thread: how to get more voice samples?
ConclusionI can see your application being very good at collecting lots of 'read' speech for the applications that people actually use command and control speech recognition for... That in and of itself would be very valuable to the open source community.
Also I'd like to discuss with you if and how ways would be better to implement, in your opinion, the technology in support of this idea.The VoxForge speech submission applet uses Java-based Postlet code for its client uploader, and the php server based code described on the postlet site.
That should give you a pretty good idea of what is needed on the upload side. Please make sure the URL for site for uploading is configurable by the user - so that if we change providers, it is just a simple change in URL.
Do you have any other approaches that might work?
I thought that could be simply editable in the source so when you would change, it can be released a new version with that simple patch whit the new url posted previously by Voxforge.
--- (Edited on 4/16/2011 12:19 pm [GMT-0400] by kmaclean) ---
Email from his mentor, Peter:
How much speech do we need?Peter also mentions in the GSOC project description that we need lots more speech to create acoustic models for dictation applications (unfortunately, I have helped spread that misconception...). Though more speech generally is better, there are diminishing returns after a point - and it is not clear where that point is.
Don't worry, I know that this won't automatically turn the Voxforge
models into high quality dictation models :)
We have been experimenting with model generation ourselves (partly
under professional council) so this is not really new for me. I also
can't remember saying that this initiative is directly related to
dictation (if you read the blog post again, I was merely asserting
the quality of the current model).
However, having more samples to play with never _hurts_. It allows
us to define higher quality criteria for the submission (and still
have enough samples). If the amount of samples is large enough, one
can even break down the model according to dialect groups, etc.
And of course gathering samples is very important for non-English
languages where the current corpus is very small.
Also I'd like to discuss with you if and how ways would be better to implement, in your opinion, the technology in support of this idea.
The VoxForge speech submission applet uses Java-based Postlet code for its client uploader, and the php server based code described on the postlet site.
Have you looked into ssc / sscd?
We already use it for our sample acquisition ourselves and having
sscd and simon compatible would be a huge plus for us.
Would you consider setting up an sscd server for Voxforge?
That should give you a pretty good idea of what is needed on the upload side. Please make sure the URL for site for uploading is configurable by the user - so that if we change providers, it is just a simple change in URL.
Don't worry, it will be :)
Best regards,
Peter
--- (Edited on 4/15/2011 8:32 pm [GMT-0400] by kmaclean) ---
Hi Pete/Ahel,
[...]
Don't worry, I know that this won't automatically turn the Voxforge models into high quality dictation models :)
We have been experimenting with model generation ourselves (partly under professional council) so this is not really new for me. I also can't remember saying that this initiative is directly related to dictation (if you read the blog post again, I was merely asserting the quality of the current model).
However, having more samples to play with never _hurts_. It allows us to define higher quality criteria for the submission (and still have enough samples). If the amount of samples is large enough, one can even break down the model according to dialect groups, etc.
And of course gathering samples is very important for non-English languages where the current corpus is very small.
Have you looked into ssc / sscd?
We already use it for our sample acquisition ourselves and having sscd and simon compatible would be a huge plus for us.
Would you consider setting up an sscd server for Voxforge?
--- (Edited on 4/15/2011 8:35 pm [GMT-0400] by kmaclean) ---
Hello everybody,
Am 2011-04-08 16:16, schrieb Ken MacLean:
We already use it for our sample acquisition ourselves and having sscd and simon compatible would be a huge plus for us.
Would you consider setting up an sscd server for Voxforge?depends - can it run on a regular web hoster, or does it need a specialized server instance?
Well it has a server component so you need to have _some_ access. It
doesn't really need root access but if it doesn't have one it can't
bind to any of the lower ports (OS restriction).
Realistically, it should run on a server where you have at least SSH
access to. It also requires a MySQL database...
The VoxForge front end (WebGUI CMS) is on a server in my basement, and the VoxForge speech submission applet uploads to php code on a 1&1 web hosting account. This was done on the assumption that my home internet connection (10 mbit down/ 1 mbit up) was not fast enough to accommodate speech uploads (along with current web traffic). If the ssc could throttle itself a bit, then maybe it could be installed on the VoxForge server.
Well you can set up QoS to limit the impact of incoming traffic on
your web traffic. It really shouldn't be noticeable.
The nice thing about speech submissions is that they will probably
be limited by the upload speed of the clients (asynchronous lines
have much faster download speeds than upload speeds). So you alone
could accommodate 10 other people having the same internet
connection you have (10 x 1mbit up = your 10mbit down).
Maybe you could have a look at ssc/sscd? Both have a manual but you
can also ask me if you have any difficulties...
Best regards,
Peter
--- (Edited on 4/15/2011 8:36 pm [GMT-0400] by kmaclean) ---
Well it has a server component so you need to have _some_ access. It doesn't really need root access but if it doesn't have one it can't bind to any of the lower ports (OS restriction).
I assume you mean megabit, right?
Well you can set up QoS to limit the impact of incoming traffic on your web traffic. It really shouldn't be noticeable.
So you alone could accommodate 10 other people having the same internet connection you have (10 x 1mbit up = your 10mbit down).
Maybe you could have a look at ssc/sscd? Both have a manual but you can also ask me if you have any difficulties...
--- (Edited on 4/15/2011 8:37 pm [GMT-0400] by kmaclean) ---
Well it has a server component so you need to have _some_ access. It doesn't really need root access but if it doesn't have one it can't bind to any of the lower ports (OS restriction).
my 1&1 account does not allow access to the lower ports.
The port is configurable so that wouldn't be that much of a problem
but setting it up on a system where you have full control would
probably be easier.
I really don't think that we would get a flood of submissions (and
if we really are that overwhelmed with good recordings I'm sure we
can arrange for more powerful hardware / more bandwidth - quite
possibly during a break of all the celebration :P).
So you alone could accommodate 10 other people having the same internet connection you have (10 x 1mbit up = your 10mbit down).
I agree, in theory - but doesn't seem to work that way in practice for some reason...
Traffic shaping / QoS can make a ton of difference. But again, I really don't think that bandwidth will be the bottleneck (at least not at the beginning).
Maybe you could have a look at ssc/sscd? Both have a manual but you can also ask me if you have any difficulties...
Will do, after April 23, I will have much more free time.
Fell free to contact me for more information!
Noticed that all audio is stored in MySql (rather than pointers to static files), is there a reason for this? What kind of security does sscd use - has it been used on the Internet or mostly in "behind the firewall" LAN configurations?
It actually only stores the filenames in the database.
sscd doesn't yet provide any kind of security (we use an VPN setup)
but adding some shouldn't really be hard. Notice that the ssc
protocol does not yet allow to retrieve samples over the network
(only user data and to upload data). To create models of the data I
wrote a couple of scripts that can query the db, retrieve the
samples and write appropriate prompts files.
So yes we would need to adjust the sscd a bit to fit in this use
case (also: fields in the database, etc.) and I'm not sure that
makes it viable for this GSoC proposal (short timeframe). Anyways,
let's see if the project gets selected and if so (or not) then we
can make further arrangements.
Best regards,
Peter
--- (Edited on 4/15/2011 8:37 pm [GMT-0400] by kmaclean) ---
Great news, Ahel's project proposal got accepted!
From the GSOC site:
Main aim for this proposal is a working project that collects audio acoustic model and then it send to Voxforge server, through re-utilizing simon code (in particoular ssc/sscd). Voxforge open source whose aim is to collect transcribed speech for use in Open Source Speech Recognition Engines. http://www.voxforge.org/ Simon open-source speech recognition program that can utilize the models created from the voxforge data. http://simon-listens.org
--- (Edited on 4/25/2011 10:42 pm [GMT-0400] by kmaclean) ---