Re: worse audio ? better ASR?

Acoustic Model Discussions

Flat

worse audio ? better ASR?

User: timobaumann
Date: 7/20/2008 4:54 pm

Views: 5600
Rating: 9

Hi,

I have recently measured the influence of added white noise on my ASR results. For this, I used sox to add random values between -N and N to each sample. This results in white noise and I can set N so that I end up with different signal-to-noise ratios (in decibel) in my audio. The results are as follows:

SNR WER SER
orig 18.8 68.2
20 16.4 61.2
15 17.7 67.1
10 33.0 90.6
5 89.2 100.0
0 97.7 100.0
-5 99.7 100.0

Ok, obvisously, for loud noise I hardly recognize anything. That's fine with me. But why in the world does moderate additive noise result in *better* results than my original audio? The noise is quite audible when the SNR is at 15dB, but still, the results are better than the unaltered (crisp sounding) audio. Not even to mention the 20dB audio which is rediculously good!

Now, has any of you ever experienced the same or has an idea why in the world white noise would increase my ASR performance?

Thanks for any pointers,
Timo

--- (Edited on 2008-07-20 23:54 [GMT+0200] by timobaumann) ---

white noise blocks out background noise

User: ralfherzog
Date: 7/20/2008 6:40 pm

Views: 195
Rating: 7

Hello Timo,

(1) This reminds me of the following test with MP3 versus Wav files:

"The tests with HTK [...] show an improvement in performance in using mp3 based audio."

So, this means that it is possible that you may get better results when using lossy MP3 files than when using lossless wav files.

(2) Is there background noise in your environment? Take a look into the Wikipedia:

"White noise CDs, when used with headphones, can aid concentration by blocking out irritating or distracting noises in a person's environment."

It is possible that the added white noise is blocking out background noise from your environment. In my opinion, this sounds like a reasonable explanation for the phenomenon that you get better results with your speech recognition software when adding white noise.

Greetings, Ralf