General Discussion

Flat
Asterisk-based User Speech Submission System
User: kmaclean
Date: 4/4/2007 10:36 am
Views: 6215
Rating: 11

Submission by trevarthan (see original post here

I've been submitting requests for volunteers on my local linux user group (Chugalug && ALE) mailing lists. So far one person from the ALE list has mentioned using an automated IVR to ease the prompt recording burden. I know the quality would be much lower than what we normally want and therefore only usable for recognition in a similar environment, but I believe the PSTN environment is a large part of the target market and such a system would eliminate the user's need to install software and purchase a headset mic (everyone has a phone). Would this sort of system be useful for a low quality sub-project or side project?

I have a good bit of experience with asterisk. I would be willing to set up such a system. We would need some sort of integration with voxforge.org for meta data assignment (user gender, login name, etc) and audio upload, but we could retain the multi-prompt multi-file format.

 

--- (Edited on 4/ 4/2007 11:36 am [GMT-0400] by kmaclean) ---

Re: Asterisk-based User Speech Submission System
User: kmaclean
Date: 4/4/2007 10:38 am
Views: 289
Rating: 19

Submission by trevarthan (see original post here)  

>>Would this sort of system be useful for a low quality sub-project or side project?

>very interesting ... I have not looked at this for a while, are there free gateway facilities to dial (from PSTN) into an Asterisk box? 

I have some open source friendly business contacts running asterisk systems. I might be able to borrow a slot or two on someone's T1 PRI. 

>>Would this sort of system be useful for a low quality sub-project or side project?

>I think it would be.  Any transcribed audio is useful (though higher sampling rate/bits per sample gives us much more flexibility - because we can downsample the audio to more than one target market).

>The VoxForge UserSubmission API (currently a work in progress) would be useful in this regard.  

>What would your high level view be at the server/network level?

 

We can obviously get as complicated as we need in the future, but for a starter system I was thinking we hard code the prompts (either text to speech using festival or manually recorded - text to speech is probably easier) to avoid any sort of prompting API.

 The prompt flow would be simple: accept audio followed by a # sign, play back the prompt, ask if it's OK, if yes then continue to next prompt, if no, replay prompt and repeat audio recording. Some of those prompts are rather long. We might want to break them up into shorter segments since people will be trying to remember them on the spot. Or else we might require people to be looking at the web site while they record prompts in which case we won't need to play the prompts for the user and everything can be assigned a numeric code or something.

When the user is finished recording prompts the asterisk server tgz's them up and submits them via HTTP POST in exactly the same way we do it now.

Difficulties come from all of the meta data we currently require allowing with the audio:

LICENSE file can be a canned catch all file assigning copyright to you or me or the FSF or something. (obviously we'll need to play a disclaimer stating this before we allow the user to begin recording, but that's not a problem.)

prompts file is a no brainer.

README file is where it gets fuzzy. We can either require the user to log into the IVR with some sort of numeric username and/or PIN, then we can pull the README data from their WWW account (requires modification to voxforge site), or we can do away with the README file altogether, or we could prompt for the values of the README file and store the meta data as audio and transcribe it later or leave it as-is.

How does all that sound? 

--- (Edited on 4/ 4/2007 9:02 am [GMT-0500] by trevarthan) ---

--- (Edited on 4/ 4/2007 11:38 am [GMT-0400] by kmaclean) ---

Re: Asterisk-based User Speech Submission System
User: kmaclean
Date: 4/4/2007 11:07 am
Views: 314
Rating: 16

>I was thinking we hard code the prompts (either text to speech using festival or manually recorded - text to speech is probably easier) to avoid any sort of prompting API.

Excellent - I was worried about creating something too elaborate only to find out that no one would be really interested in submitting speech via the phone.

>We might want to break them up into shorter segments since people will be trying to remember them on the spot.

It would make sense to break up the longer prompts.  It does not matter one way or another from an Acoustic Model training process ... as long as the prompts match what is in the audio file.

>Or else we might require people to be looking at the web site while they record prompts

I think it would be better to stick to an all audio system before going multi-modal.

>README file is where it gets fuzzy.

The user created 'Recording Information' and 'File Info' sections of the README are not really relevant in this context.  Asterisk AGI scripts could fill in these sections.
With respect to Speaker Characteristics, I am thinking that users of a phone based system would be more interested in short submissions (1-5 prompt lines).  Because of this, Pronunciation Dialect (which would take too long for a user to determine over the phone) and Age Range (not as relevant since we are collecting lower quality audio) should not be required.  If you can prompt the user for their Gender that would be great (but this could be manually determined after the fact).  Therefore, to start out with, I think we do without Speaker Characteristics.

So basically, I think I see an Asterisk dial plan or AGI script for prompting users using Asterisk's built-in Festival implementation, saving the audio to a directory, keyed by some sort of unique user ID, and when enough audio files are collected (40-ish audio files), submitting them as a group under a user ID on VoxForge called something like Asterisk (or under your user ID).  I don't think there is a need for telephony users to have a user ID on VoxForge, to keep things simple to start, and since they will not necessarily be logging into the VoxForge website.  Therefore an Asterisk Speech Submission app could be very self-contained to start out with, and if there is interest, then we could look at tighter integration with the VoxForge website.

Let me know what you think 

Great Project idea BTW!

Thanks,

Ken 

--- (Edited on 4/ 4/2007 12:07 pm [GMT-0400] by kmaclean) ---

Re: Asterisk-based User Speech Submission System
User: trevarthan
Date: 4/4/2007 12:11 pm
Views: 341
Rating: 17

I like it. Clean and simple. I'll prompt each user for gender (DTMF:0=male,1=female) and otherwise leave out Speaker Characteristics until we decide whether or not the project is useful enough to warrant additional complexity.

Tell you what: I won't even bother with automated submissions at first to avoid false posts and other kinks. I'll submit them manually on a weekly basis starting out (more often if we get a lot of traffic).

Each bulk tgz will contain a single subdirectory (YYYYMMDD-HHMMSS-uniqueid) for *each* recording session. Each subdir will contain it's own prompts, WAVs, LICENSE, and README files.

Sounds like a fun project. Can't wait to get started. :) I'll begin asking around for PRI space immediately and start developing a prototype on my home system tonight. Worst case I'll just run it on my phone system at home and we'll be restricted to one line. But I'm hoping I can find a better home for the project.

--- (Edited on 4/ 4/2007 12:11 pm [GMT-0500] by trevarthan) ---

Re: Asterisk-based User Speech Submission System
User: kmaclean
Date: 4/12/2007 2:27 pm
Views: 2216
Rating: 16
Added the following Trac and SVN sites for the VoxForge IVR project:

Public VoxForge Subversion Repository URL is now located here:

    http://www.dev.voxforge.org/svn/VoxForgeIVR

You can checkout the source code using the following Subversion command:

$svn checkout http://www.dev.voxforge.org/svn/VoxForgeIVR

The Corresponding Trac site is now located here:

     http://www.dev.voxforge.org/projects/VoxForgeIVR

Ken 

--- (Edited on 4/12/2007 3:27 pm [GMT-0400] by kmaclean) ---

PreviousNext