Question about acoustic model of Sphinx and voxforge

Acoustic Model Discussions

User: cchen1103
Date: 8/2/2010 3:05 pm

Views: 5624
Rating: 2

I tried acoustic model from sphinx(Wsj_8KHz) as well as from Voxforge for Engilsh (8KHz with LDA transfer).

Both acoustic modules work well with the sample wave files (10-15% WER with 3gram statistic language model). But when I tried to run the transcript against a telephone conference recording file with only one user (same 8 KHz sampling rate), the WER get significantly increase to over 80%.

If I tried to use the same config to transcript my microphone recording file with 8KHz sampling rate, the WER drops to 30%.

I am wondering if high WER over the conference recording file is due to the distortion over the phone line and other factors. If that is the case, do I need to train a perticular acoustic model? Is there other way I can do with it?

Any help would be appreciated. Thanks.

--- (Edited on 8/2/2010 3:05 pm [GMT-0500] by cchen1103) ---

Re: Question about acoustic model of Sphinx and voxforge

User: kmaclean
Date: 9/11/2010 11:17 pm

Views: 2336
Rating: 1

>I am wondering if high WER over the conference recording file is due to the

>distortion over the phone line and other factors. If that is the case, do I

>need to train a perticular acoustic model? Is there other way I can do with

>it?

You might try using noise cancellation (see nsh's post) before sending the audio to the recognizer.

You could add phone line noise to VoxForge audio programmatically and train new acoustic models using the modified audio. See David Gelbart's caveats on this approach in this post: Acoustic model for mobile devices

--- (Edited on 9/12/2010 12:17 am [GMT-0400] by kmaclean) ---

Previous • Next •


Username	Password