VoxForge
If you'd like to encourage many more contributions, how about developing a Macromedia Flash voice recorder and embedding it in your website?
This could make it really quick and easy for people to contribute, and persuade many "casual visitors" to record a few of the scripts.
Cheers,
Jon (www.orangejon.com)
--- (Edited on 1/18/2007 6:48 pm [GMT-0600] by Visitor) ---
Hi Jon
I agree. Audio Submission must be as painless as possible if we are to encourage others to contribute.
After seeing your post, I researched Flash a bit. My understanding is that a Flash based audio recorder is a streaming recorder (i.e. it does not store the audio on the PC). All the posts I`ve seen mention that you need the Flash Media Server or the Open Source Red5 server for audio to work.
In addition, it also seems that the audio streaming used by the Flash client use compression to stream back to the server. Speech Recognition Engines work best recognizing speech with the same characteristics as the audio their Acoustic Models were trained with. In other words, if you train an Acoustic Model with speech audio collected from regular Telephone lines (8kHz-8bit audio, and 100+ hours of such audio), then the Speech Recognition engine will be good at recognizing that type of speech. It will not be so good at recognizing VoIP speech transmitted using a lossy codec at different sampling rates or bits per sample.
We would need the Flash client to capture and stream uncompressed audio or use a lossless compressed codec. However, because VoxForge is looking for for higher sampling rates and bits per sample (48kHz-16bit), we simply would not have the network bandwidth to accommodate such streams.
Please let me know if I am wrong in this interpretation - it would be great if Flash would permit a simple way to record of audio onto a user`s hard drive, which they could then upload to VoxForge.
I am currently looking at a Java based solution to address this need. The Java Sound Demo looks like a good starting point for such an app.
all the best,
Ken
--- (Edited on 1/19/2007 12:09 pm [GMT-0500] by kmaclean) ---
Hi Jon,
After more research, it may be that a Flash based audio recorder might fit the bill - if it buffers its stream before sending it to the server. We basically cannot support a real-time stream from an client to the VoxForge server, but if the Flash client can let the user record, and start streaming the audio but not in real-time, this might be workable.
Anyway, there are some examples on the Red5 server site (an Open Source Flash server) that I need to take a look at in more detail before deciding on a Flash or Java WebStart (or applet) solution.
thanks for pointing out Flash as a possible approach,
Ken
--- (Edited on 1/20/2007 5:13 pm [GMT-0500] by kmaclean) ---
More info:
Flash uses its own proprietary codec for audio called the Nellymoser Asao Codec. This codec is proprietary, although there was a bounty for an Open Source implementation of a compatible audio codec. I assume it uses lossy compression.
--- (Edited on 2/13/2007 3:43 pm [GMT-0500] by kmaclean) ---
I personally think Java will provide the most portable solution (assuming Flash is out of the question due to streaming and/or codec restrictions.)
Let's take a step back though and look at the problem one step at a time...
Ken, why don't you (or we?) define one or more API function calls for:
1- selecting/getting a set of prompts
2- submitting a single recording file (wav+prompt-text+userid etc.)
I suggest the API should be some sort of HTTP-based XML like SOAP or something. The keys being simple and programming language neutral. The server would of course be hosted on VoxForge servers.
The next steps would be to allow the community to develop clients to this API. A client can record a file or set of files locally and upload via the API as needed. Some people like Java, Some like Flash some like AJAX or .NET. Maybe VoxForge hosts/provides one or two "official" clients (eg. Java/Flash) and the community can integrate others as desired.
-joe
--- (Edited on 2/22/2007 09:37:39 [GMT-0500] by jaiger) ---
Hi Joe,
thanks for the feedback...
I was actually trying to figure out how to get help from the Google Summer of Code project for the audio submission portion of the VoxForge project. I did not include it in the current list because I thought it would have been too 'WebGUI cms' oriented and not 'speech recognition' oriented enough. I never thought of documenting an API for prompt selection and speech submission. Excellent idea. Any help you can provide in creating such an API would be greatly appreciated.
What
Basically we need a way to allow users to submit transcribed speech to the VoxForge Website, in a way that permits as much automation of the submission and back-end Acoustic Model creation processes as possible.
Technical Limitations
Although the API should be technology agnostic, it's needs work with our current set up, which is as follows:
VoxForge uses the WebGUI Content Management System. The Server side uses Perl and MySQL, and the client side uses css, html and Javascript.
The VoxForge Front-end web server (www.voxforge.org) has bandwidth limitations - from a user perspective it has 5 mbit upload and 800 kbit download (according to my ISP at least .... ).
How
I don't have much experience with SOAP or XML based APIs, so I'll need your help on this one.
As a start to creating an API, here are some comments:
1- selecting/getting a set of prompts
I currently have a script (a 'macro' in WebGUI speak) that will
randomly select a prompt file for a user. It also keeps track
of which prompts the user has already submitted in the user's
profile. It stays selected until the user submits audio
corresponding to the prompt - I have not completed this part yet. It will then randomly select
another prompt. The prompts
it selects are basically the prompt file 'children' of this URL (i.e.
http://www.voxforge.org/home/submitspeech/linux/step-1/phoneme). So new prompt files can easily be added.
The approach I took is that the user needs to be signed in to the VoxForge website. They then click a link ('Submit Speech') and get a submission page with the randomly selected prompts file displayed on it, which he/she then reads to create the audio files on his PC (based on feedback from atterer). The user then creates a zip or tarball of the recorded files, adds it as an attachment, and saves (i.e. uploads) their submission. I was hoping figure out a way to get the 'client' to do this transparently, so that the user would not have to understand zip or tarballs, just click "upload".
I am not really sure of the best approach from an API perspective, since the server, in this case, keeps track of the audio submitted by a user. Should the client simply keep logon credentials, and request the prompt file from the server? What about if the user wants to submit their own prompts (which at some point we might need to look into get better triphone coverage)?
Should a different approach be used to make the API simpler. Should we use my approach for now, with a view to evolving it to an API based solution?
2- submitting a single recording file (wav+prompt-text+userid etc.)
Are you thinking that the user should be able to submit one wav file and one prompt file at a time? I was thinking of something along these lines at one point, because users might be more apt to submit 1-3 audio files at a time, rather than having to submit 40 wav files at a time.
But this then requires that the back-end processes/scripts be re-thought ... so that rather than storing the URL of submitted prompt files in the client profile, a file or database table would need to be created on the server that would keep track of each prompt line the user submitted. This would make things easier if the user decided to submit their own prompts.
more questions than answers for now,
thanks,
Ken
--- (Edited on 2/22/2007 12:27 pm [GMT-0500] by kmaclean) ---
Ken,
1- selecting/getting a set of prompts
I think what I have in mind for this is basically what you described with the additional details that (a) I define a "prompt" as a single sentence, not 40 sentences and (b) the "API version" of this feature would output XML/plain-text for simple parsing. What I mean by (b) is that your WebGUI CMS would need to be told to not output typical HTML pages (header, footer, javascript, css tags etc.)
The consumer of the API version is not a typical web browser but instead a piece of client software. it may be a Java applet, some AJAX application or Flash etc.
2- submitting a single recording file
yes, I see the unit of recording should be a single sentence, one line. This goes a long way toward lowering the barrier to contribution. The second barrier is tarball/zip creation/format/upload.
In addition, the API should support custom prompt text. The more the merrier.
More fuel to the fire.
-joe
--- (Edited on 2/22/2007 13:33:05 [GMT-0500] by jaiger) ---
1- selecting/getting a set of prompts
> (a) I define a "prompt" as a single sentence, not 40 sentences
I was afraid of that ... :)
> (b) the "API version" of this feature would output XML/plain-text for simple parsing. What I mean by (b) is that your WebGUI CMS would need to be told to not output typical HTML pages (header, footer, javascript, css tags etc.)
this can be done with WebGUI
2- submitting a single recording file
>yes, I see the unit of recording should be a single sentence, one line. This goes a long way toward lowering the barrier to contribution.
agree
>The second barrier is tarball/zip creation/format/upload.
agree, but I think that would be the responsibility of the client software ... trying to limit project 'scope' here :)
>In addition, the API should support custom prompt text. The more the merrier.;
that would be a future release ...
Next Steps:
Better understand the WebGUI User Submission System:
I think I will carry-on with what I am doing for now - at least to better understand the WebGUI submission system, and in order to better understand how the API should look.
Creating new database table in WebGUI:
Then I need to look at how to store larger amounts of data in the user profile or create a table in WebGUI. WebGUI is excellent at letting someone add new fields to a user profile, but I don't think it would scale well for prompt type information - need to confirm this ...
I need to create a working prototype of the server-side script changes before defining the API. I think I will be better off with a working example before trying to create an API. I don't know WebGUI well enough to define an API in the abstract.
Process to send individual prompts to client:
Look at how to deliver the prompts to a client application. I think this should be a relatively straightforward process, once the tables are created. Not sure about authentication - might be easier to get the user to log on to the web site before trying to get prompts from the site.
A Web Admin's job is never done ...
Ken
--- (Edited on 2/22/2007 2:37 pm [GMT-0500] by kmaclean) ---
Converted this thread into a possible Google Summer of Code project idea - see this link.
Results of Next Steps:
Better understand the WebGUI User Submission System:
on-going; still getting up to speed on the finer points of object-oriented perl.
Creating new database table in WebGUI:
not required - can use WebGUI's user profile to store this information
I need to create a working prototype of the server-side script changes before defining the API. I think I will be better off with a working example before trying to create an API. I don't know WebGUI well enough to define an API in the abstract.
The User Profile object can contain all the information in the README file (any fields can be set as required upon sign-on), and can hold a list of all user submitted prompts.
I've got a macro that can display a specified number ( 1 or more) of random prompts when a user clicks a specific URL (or with a get), which then updates a user profile field with the prompt(s) that have been selected.
I've got the basics of the modifications to the WebGUI collaboration system so that it will pick up the README info from the User's profile, and the Prompts information they have selected.
Basically, a new user entry to the Speech Submission messaging system will automatically populates the post entry with all the README and Prompts information (which had been previously had to be done by hand), and keeps track of their submitted prompts (so that the random prompt generator does not offer them the same prompts twice).
Need to fix some formatting issues, and complete testing.
Process to send individual prompts to client:
completed - client would poll a specific URL, and it will return a single prompt. Not sure about how to deal with client ID and sign-on. Assuming that if a Java web app is the client, user state is handled internally by the web app. Need to confirm this.
Then look at the specifics of an API.
Part of the Google Summer of code project to create a software client to allow users to record and submit audio to the VoxForge website (see this link).
Joe - would you be willing to help on this (Google SoC mentor or backup mentor)?
thanks,
Ken
--- (Edited on 3/ 6/2007 1:59 pm [GMT-0500] by kmaclean) ---
> Joe - would you be willing to help on this (Google SoC mentor or backup mentor)?
Ken,
Sorry about missing this direct question. I didn't know you were expecting a response. I've been pretty busy at work/home which has kept me from contributing to the site recently.
Is it too late for the GSoC?
If not, I would be willing to backup-mentor in some way the submission API.
sorry about the delay.
-joe
--- (Edited on 3/24/2007 11:32:57 [GMT-0400] by jaiger) ---