German

Nested
low recognition accuracy using Pocketsphinx
User: mohemara92
Date: 9/21/2018 5:54 am
Views: 7063
Rating: 0

Hello,
I am trying to recognize some German speech using the following language model "cmusphinx-ptm-generic-de-r20180609" but the accuracy is about 35% which is too low.
I am recording a .wav file by a native german speaker using the following command :
rec -V1 -q -r 16000 -c 1 -b 16 -e signed-integer --endian little "test1.wav"
and then run the following command

pocketsphinx_batch -adcin yes -cepdir wav -cepext .wav -ctl test.fileids -lm "file.lm" -dict "file.dic" -hmm "acoustic_model_directory" -hyp test.hyp

word_align.pl test.transcription test.hyp

Is there something wrong becauause I think it's not normal to get such low accuracy using the provided models?

test.zip test.zip
Re: low recognition accuracy using Pocketsphinx
User: guenter
Date: 9/24/2018 2:01 am
Views: 234
Rating: 2

Hi mohemara92,

couple of remarks:

  • the audio recording is very distorted (you need to lower the recording volume)
  • the audio recording is very long, try splitting it up into individual sentences
  • we are measuring about 30-40% WER for the Sphinx PTM models for unknown speakers and background noise. Try the Kaldi models if you need better results
  • try using a language model tailored to your application domain (that will help Sphinx as well as Kaldi)

 

cheers,

 

   guenter

 

 

Re: low recognition accuracy using Pocketsphinx
User: mohemara92
Date: 9/25/2018 4:08 am
Views: 2
Rating: 0

Hi Guenter,

Thanks a lot for helping.

I have some questions related to your remarks:

  • Is there an optimal volume level for the recordings? or it's dependet on the microphone and enviroment ..etc.
  • How long should be one recording? because the attached one is only 30 seconds long.
  • Are the ptm models dedicated to some known speakers? and if i need to recognize my voice, I have to train my own model.
  • Regarding the Kaldi models, do you mean using the kaldi acoustic models with pocketsphinx or shift to another software? in case of the first, how can I find/build these models? I tried to search but found no kaldi acoustic models under https://github.com/uhh-lt/kaldi-tuda-de

    Kind regards,

    Mohamed
Re: low recognition accuracy using Pocketsphinx
User: guenter
Date: 9/25/2018 4:36 am
Views: 4038
Rating: 1

Hi Mohamed,

 

  • Is there an optimal volume level for the recordings? or it's dependet on the microphone and enviroment ..etc.

 

as a rule of thumb when you check your recording in Audacity only few samples should peak outside the +- 0.5 range.

  • How long should be one recording? because the attached one is only 30 seconds long.

 

30 seconds is kind of the upper limit but most of the training material is much shorter, i.e. in the 5-12s range (typically a single sentence).

  • Are the ptm models dedicated to some known speakers? and if i need to recognize my voice, I have to train my own model.

 

actually I had built the PTM models on request (I am not using CMU Sphinx in any of my projects) so I have very little experience with them. From what I see in the training results it seems to me that you cannot use them as-is for general unknown speaker speech recognition. You will either have to adapt them to a specific voice or tailor the language model for a narrow domain to get useful results (or, even better: do both ;) )

  • Regarding the Kaldi models, do you mean using the kaldi acoustic models with pocketsphinx or shift to another software? 

you will have to switch to kaldi-asr to use those

  • in case of the first, how can I find/build these models? I tried to search but found no kaldi acoustic models under https://github.com/uhh-lt/kaldi-tuda-de

 

you can find all our models along with documentation here:

https://github.com/gooofy/zamia-speech#download
PreviousNext