Re: noise training in HTK

Acoustic Model Discussions

Flat

noise training in HTK

User: ubanov
Date: 11/13/2008 4:56 pm

Views: 9539
Rating: 3

Hi,

In sphinx there are phonemes for noises, and the documents speak about the importantence of this sounds.

Would it be necessary to train noises in order to recognize this sounds as noise in HTK?!?

(I have trained sounds, and each time I make noises I get very strange results). Os may be that I need more voice for training and then the noises will be ignored? (the training data is spanish training with 20-30 minutes of voice)

Thanks in advance.

--- (Edited on 11/13/2008 4:56 pm [GMT-0600] by ubanov) ---

Re: noise training in HTK

User: kmaclean
Date: 11/13/2008 9:42 pm

Views: 198
Rating: 3

Hi Ubanov,

>Would it be necessary to train noises in order to recognize this sounds as

>noise in HTK?!?

Yes, we should train for noise in our speech submissions. I have not been too concerned about this because it can be added after the fact... (i.e. we can add noise tags to the user submission transcriptions/prompts).

>Os may be that I need more voice for training and then the noises will be ignored?

It is best to train with clean speech (i.e. with no noise) - see these posts more on this topic:

Issues in Collecting Speech Audio for Free GPL Speech Corpus (Arthur Chan - Sphinx)
What are Best Practices for Collecting Speech for a Free GPL Speech Corpus? (David Gelbart)
More on Collecting Speech Audio for Free GPL Speech Corpus (Joe Picone -ISIP)
Comments on: "A good acoustic model needs to be trained with speech recorded in the environment it is targeted to recognize"

This article referenced in this link might also be useful: The Production of Speech Corpora

Ken

--- (Edited on 11/13/2008 10:42 pm [GMT-0500] by kmaclean) ---

Re: noise training in HTK

User: Chetanji
Date: 2/6/2009 3:25 am

Views: 127
Rating: 2

I am a beginner in creating a quality ASR.

One major question regards the environment I am targeting.

If I record my training data with the same poor quality microphone the users tend to use, there will be an overlay of noise consistent throughout the whole utt.

There are no tags for this noise as it becomes part of the phones for each word spoken.

So we have a three dimensional problem in the data recording.

1) The atmosphere the microphone records in (including the additional noise the connection makes to the digitizer in the computer.

2) The sounds the mouth, throat, etc. make in recording. i.e. the grunts, teeth clicks, huh's, haa, lip smacking, hissing, etc. all of this just coming out of the human's mouth. This is one area of noise tags. The other would be people in our area while recording and machinery, i.e. phones, fans, beepers, doors, sirens, baby crying, loud talking, etc.

3) Finally, we get to the actual words spoken in continuous speech.

This is the nature of my problem and it is taking a lot of time in coming to a proper understanding of how to properly record an Acoustic Database for a hospital in southern India. Many offices have open ceiling with a high roof (humid tropical environment.)

Thousands of people come to this hospital every day. It is quite noisey.

I found not long ago that raising the recorded wav files ampitude 6db took me from a horrible WER of 93% to a quite acceptable 12%.

These wav files were recorded with a very noisey mike.

Recording an open microphone was listening to a wind tunnel with strange noises from time to time.

By raising the amplitude 6db I was making the words spoken raise up through the cloud obscuring it, thus being able to be heard by the recognizer.

Does this make any sense to anyone?

And how do I create a quality ASR in this environment???

Blessings,

Chetanji

--- (Edited on 2/6/2009 3:25 am [GMT-0600] by Visitor) ---

Re: noise training in HTK

User: nsh
Date: 2/6/2009 3:41 am

Views: 2668
Rating: 3

And you are here as well.

> I found not long ago that raising the recorded wav files ampitude 6db took me from a horrible WER of 93% to a quite acceptable 12%.

It was a mistake. Amplitude is not directly related to WER. Probably you just made frontend work better in endpointer area.

> And how do I create a quality ASR in this environment???

Noise training is not directly related to HTK since HTK itself have no methods to deal with noise. Usual solutions require

noise cancellation in frontend processor

special noise-robust features like RASTA

usage of better classifiers both offline (discriminative training) or online (HMM-ANN or HMM-SVM methods)

The noise cancellation is probably the easiest thing to start with.

--- (Edited on 2/6/2009 3:41 am [GMT-0600] by nsh) ---

Re: noise training in HTK

User: kmaclean
Date: 2/6/2009 9:21 am

Views: 3062
Rating: 3

Hi Chetanji,

Julius can perform noise reduction while recognizing using spectral substraction (not sure if Sphinx can do this). From the Julius manual (this is for an older version of Julius - the newer one may have more options):

       -sscalc
              Perform spectral subtraction using head part of each file. With
              this option, Julius assume there are certain length of silence
              at each input file. Valid only for rawfile input.   Conflict
              with "-ssload".

     -ssload filename
              Perform spectral subtraction for speech input using pre-esti-
              mated noise spectrum from file. The noise spectrum data should
              be computed beforehand by mkss. Valid for all speech input.
              Conflict with "-sscalc".

       -ssalpha value
              Alpha coefficient of spectral subtraction for "-sscals" and
              "-ssload". Noise will be subtracted stronger as this value gets
              larger, but distortion of the resulting signal also becomes
              remarkable. (default: 2.0)

       -ssfloor value
              Flooring coefficient of spectral subtraction.   The spectral
              parameters that go under zero after subtraction will be substi-
              tuted by the source signal with this coefficient multiplied.
              (default: 0.5)

This does not help you on the recording end - you still need as clean of speech as you can get, but for recognizing, spectral substraction might help.

Theoretically, you might be able to use two microphones, one to recognize the target speech and another to pick up the background noise that you would feed in to the Julius spectral substraction algorithm. I have read of noise cancelling headsets that use two microphones in a similar way for noise cancellation.

Ken

--- (Edited on 2/6/2009 10:21 am [GMT-0500] by kmaclean) ---

Previous • Next •


Username	Password