VoxForge
Hello everybody
I have some problems with the accuracy of the German acoustic model. I tested the sphinx 4 hellodigits and wavfile example. The English model works fine. It was able to recognize nearly 90%.
But the German one gives really bad results. At it’s best it might recognize 10% of the spoken word. I recorded 16KHz wav files and tested them at the wavfile demo. The results were better but only up to 50% recognition.
The http://www.mediafire.com/?j1l9d0ujmgg test.wav file will be recognize complete correctly.
I tried following files in various combinations:
I have only used the config file of the http://www.mediafire.com/?j1l9d0ujmgg example and made some adaptations for the microphone support.
What goes wrong here? I use Eclipse with java SE 1.6.0_14 under win XP for testing. I use also two different micros.
It would be nice if you can help me.
> And made some adaptations for the microphone support.
Probably corrections were not accurate. There can be multiple issues here:
1. sphinx4 doesn't work well on windows with microphone (known, should be fixed in trunk).
2. The wave file you were trying to recognize had wrong sample rate.
I suggest you to upload your modification so we could look and try ourselves.
The sample rate of my wave file is the same like in the test.wav.
If you can give me your configuration file for microphone testing and a recorded wav file then it will be nice, too.
I have only copy the live frontend part of the Hallodigits example to test microphone speech.
Hi Falco
Sorry for delay, I finally looked into this. I didn't check microphone input but only a wavfile part.
So far my thoughs are:
0) Configuration looks ok.
1) Accuracy is about 40% and it's ok for this type of audio.
2) Audio is not a raw recordings. Recordings with N in the beginning miss the spectrum above 5 kHz. I suppose they were upsampled from 11.25 kHz recordings. That's a bad thing because it doesn't give you a spectrum information you've already lost. You need to train 8kHz model that will use only frequences up to 3500 Hz.
3) The files without N at start have some masking in spectrum. Did you convert them from mp3? It's also critical to use special mp3 decoders to get accuracy. Often it will be plain broken. Again, you need an adaptation/special type of model.
4) DE model obviously is not perfect and needs improvement.