worse audio ? better ASR?

Acoustic Model Discussions

Flat

User: timobaumann
Date: 7/20/2008 4:54 pm

Views: 6465
Rating: 9

Hi,

I have recently measured the influence of added white noise on my ASR results. For this, I used sox to add random values between -N and N to each sample. This results in white noise and I can set N so that I end up with different signal-to-noise ratios (in decibel) in my audio. The results are as follows:

SNR WER SER
orig 18.8 68.2
20 16.4 61.2
15 17.7 67.1
10 33.0 90.6
5 89.2 100.0
0 97.7 100.0
-5 99.7 100.0

Ok, obvisously, for loud noise I hardly recognize anything. That's fine with me. But why in the world does moderate additive noise result in *better* results than my original audio? The noise is quite audible when the SNR is at 15dB, but still, the results are better than the unaltered (crisp sounding) audio. Not even to mention the 20dB audio which is rediculously good!

Now, has any of you ever experienced the same or has an idea why in the world white noise would increase my ASR performance?

Thanks for any pointers,
Timo

--- (Edited on 2008-07-20 23:54 [GMT+0200] by timobaumann) ---

white noise blocks out background noise

User: ralfherzog
Date: 7/20/2008 6:40 pm

Views: 198
Rating: 7

Hello Timo,

(1) This reminds me of the following test with MP3 versus Wav files:

"The tests with HTK [...] show an improvement in performance in using mp3 based audio."

So, this means that it is possible that you may get better results when using lossy MP3 files than when using lossless wav files.

(2) Is there background noise in your environment? Take a look into the Wikipedia:

"White noise CDs, when used with headphones, can aid concentration by blocking out irritating or distracting noises in a person's environment."

It is possible that the added white noise is blocking out background noise from your environment. In my opinion, this sounds like a reasonable explanation for the phenomenon that you get better results with your speech recognition software when adding white noise.

Greetings, Ralf

--- (Edited on 2008-07-20 6:40 pm [GMT-0500] by ralfherzog) ---

Re: worse audio ? better ASR?

User: kmaclean
Date: 7/21/2008 5:34 pm

Views: 2502
Rating: 10

Hi Timo,

Check out this thread: Acoustic model for mobile device

David Gelbart's post (3rd one) says:

It can also be useful to add noise or reverberation to the training data to make it match the target better.
A Google Scholar search for: stahl speech recognition
will turn up some papers on the subject co-authored by Volker Stahl. Other authors have also published on this subject but I can't remember any names offhand.

Ken

--- (Edited on 7/21/2008 6:34 pm [GMT-0400] by kmaclean) ---

Previous • Next •


Username	Password