VoxForge
Hi,
I am new to sphinx. I have downloaded the latest release version for sphinx3 (0.8) and the nightly build for sphinxTrain together with the an4.
When I run perl scripts_pl/decode/slave.pl, I run into the following problem:
SYSTEM_ERROR: "lm_3g_dmp.c", line 1270: fopen(/scratch/SphinxTutorial/an4/etc/an4.lm.DMP,rb) failed
; No such file or directory
--- (Edited on 2/4/2009 10:49 pm [GMT-0600] by jcwang) ---
>and I cannot find an4.lm.DMP anywhere.
It seems you didn't call setup_tutorial.pl from sphinx3 correctly. Anyhow, you can use an4.ug.lm.DMP if you want, just rename it or change the name in etc/sphinx_decode.cfg.
> Also, I am wondering what's the difference between an4_train.fileids an an4_test.fileids (and similarly an4_train.transcript and an4_test.transcript)? If I were to create my own an4_train.transcript, how would an4_test.transcript be generated?
Test files are used for testing, train files for training. They shouldn't overlap since otherwise there will be possibility of overtraining, the model will be closely bounded to train data. Usually testing goes on undependant data.The size of test data is usually 1/10 of the train data.
> If I were to create my own an4_train.transcript, how would an4_test.transcript be generated?
The same way as trian.transcript. Just listen for the contents of the test files and create the transcription.
--- (Edited on 2/5/2009 12:55 am [GMT-0600] by nsh) ---
Hi Nsh,
Thank you very much for your prompt reply. I really appreciate it. I re-run the setup_tutorial.pl, and the problem indeed goes away :). Thanks!
I have another question. Assuming that I have pocketsphinx running on a server. If I produced my own accoustic model and dictionary files after going through the sphinxTrain as described in
http://www.speech.cs.cmu.edu/sphinx/tutorial.html
would you know what are the files I should copy over from an4/ to pocketsphinx/? I have searched around, but couldn't find such info. If you know of any link describing this, please let me know as well. BTW, my intention is to replace the default acoustic model/dictionary that comes with pocketSphinx with my own set (with very limited number of words/phrases).
Again, thank you very much for your help! I really appreciate it.
regards,
Jimmy
--- (Edited on 2/5/2009 2:19 am [GMT-0600] by jcwang) ---
> would you know what are the files I should copy over from an4/to pocketsphinx/?
You shouldn't copy them. About files, the model is inside model_parameters in the folder like db_name.cd_cont_3000 or db_name.mllt_cd_cont_3000. There are mdef, feat.params, variances, noisedict, everything for -hmm pocketsphinx param.
The language model (-lm) and the dictionary (-dict) are in etc of course, that's all you need to run pocketsphinx.
>my intention is to replace the default acoustic model/dictionary that comes with pocketSphinx
If you are going to recognize few English words, I'd consider using existing models. It's not that trivial to train good model yourself. Just replace the grammar and the dictionary.
--- (Edited on 2/5/2009 7:54 pm [GMT-0600] by nsh) ---
Hi Nsh,
Again, thank you so much for your help!
I tried to decode using pocketsphinx_continous as follows:
S2CONTINUOUS=/usr/local/bin/pocketsphinx_continuous
HMM=//source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000/
LMFILE=/source/sphinxInfo/trainInfo/etc/7677.lm.DMP
DICT=/source/sphinxInfo/trainInfo/etc/7677.dic
echo "<executing $S2CONTINUOUS, please wait>"
$S2CONTINUOUS \
-fwdflat no -bestpath no \
-lm ${LMFILE} \
-dict ${DICT} \
-hmm ${HMM} \
-samprate 8000 \
-nfft 256 $@
INFO: s2_semi_mgau.c(1120): Reading S3 mixture gaussian file '//source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means'
FATAL_ERROR: "s2_semi_mgau.c", line 1150: //source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means: #codebooks (360) != 1
--- (Edited on 2/5/2009 8:26 pm [GMT-0600] by jcwang) ---
Hi,
Just want to add that I generated an adapted acoustic model and the result is pretty good. Previously, I used the default acoustic model with my own dictionary file, and the accuracy rate (in recognizing the 10 names I entered) is about 50%. With the adapted model, it's about 80% accurate!
The remaining the two names that Pocketsphinx couldn't decode correctly are both foreign, and are harder to pronounce.
If possible, I'd still like to try out my own acoustic model. If anyone knows what went wrong (please see my posting above), or have any suggestion on ways for developing reliable custom acoustic model for names, please let me know. Thank you very much for your help!
regards,
Jimmy
--- (Edited on 2/6/2009 12:22 am [GMT-0600] by jcwang) ---
> FATAL_ERROR: "s2_semi_mgau.c", line 1150: //source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means: #codebooks (360) != 1
You forgot -feat 1s_c_d_dd. You don't need -nfft, it's automatically taken from feat.params.
> Just want to add that I generated an adapted acoustic model and the result is pretty good. Previously, I used the default acoustic model with my own dictionary file, and the accuracy rate (in recognizing the 10 names I entered) is about 50%. With the adapted model, it's about 80% accurate!
Well, for a few names it must be 98% accurate, not 80%. Did you change the feat params like -upperf 3500 -lowerf 200 -nfilt 31. You need to change this for 8 kHz. To make sure you do everything correctly describe what you did more precisely and describe the results - how big is your testing set, how big is it's vocabulary, what is the WER.
> The remaining the two names that Pocketsphinx couldn't decode correctly are both foreign, and are harder to pronounce.
You sometimes need to correct the dictionary as well.
--- (Edited on 2/6/2009 3:17 am [GMT-0600] by nsh) ---
Hi Nsh,
Thank you very much for your help!
For the adapted model, the following is what I did. If I did anything incorrect or any improvement is needed, please do not hesitate to let me know:
1. Prepared the required files and recordings:
myTest1.dic
myTest1.listoffiles
myTest1.transcription
myTest1.txt and
myTest1_0001.raw to myTest9_0001.raw.
The dictionary file contained the following:
AGGARWAL AH G AA R W AH L
BIRAJA B AY R AE JH AH
CHIOU CH AY UW
DEEPALI D IY EH P AH L IY
DEGLURKAR D IH G L AH R K AH R
DEVULAPALLI D IH V Y UW L AH P AE L IY
HENRY HH EH N R IY
JIMMY JH IH M IY
JOON JH UW N
KEVIN K EH V IH N
LEE L IY
RUCHI R AH CH IY
SKALAHAS S K AE L AH HH AH Z
SOUMYA S AW M AY AH
WANG W AE NG
YE Y IY
YE(1) Y EH
YEH Y EH
YINGQING Y IH N G K AH NG
http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/AcousticModelAdaptation
where 7677.lm is as follow:
Language model created by QuickLM on Tue Feb 3 19:22:00 EST 2009
Copyright (c) 1996-2000
Carnegie Mellon University and Alexander I. Rudnicky
This model based on a corpus of 9 sentences and 20 words
The (fixed) discount mass is 0.5
\data\
ngram 1=20
ngram 2=27
ngram 3=18
\1-grams:
-0.9031 </s> -0.3010
-0.9031 <s> -0.2430
-1.8573 AGGARWAL -0.2430
-1.8573 BIRAJA -0.2950
-1.8573 CHIOU -0.2430
-1.8573 DEEPALI -0.2950
-1.8573 DEGLURKAR -0.2430
-1.8573 DEVULAPALLI -0.2430
-1.8573 HENRY -0.2950
-1.8573 JIMMY -0.2950
-1.8573 JOON -0.2950
-1.8573 KEVIN -0.2950
-1.8573 LEE -0.2430
-1.8573 RUCHI -0.2950
-1.8573 SKALAHAS -0.2430
-1.8573 SOUMYA -0.2950
-1.8573 WANG -0.2430
-1.8573 YE -0.2430
-1.8573 YEH -0.2430
-1.8573 YINGQING -0.2950
\2-grams:
-1.2553 <s> BIRAJA 0.0000
-1.2553 <s> DEEPALI 0.0000
-1.2553 <s> HENRY 0.0000
-1.2553 <s> JIMMY 0.0000
-1.2553 <s> JOON 0.0000
-1.2553 <s> KEVIN 0.0000
-1.2553 <s> RUCHI 0.0000
-1.2553 <s> SOUMYA 0.0000
-1.2553 <s> YINGQING 0.0000
-0.3010 AGGARWAL </s> -0.3010
-0.3010 BIRAJA DEVULAPALLI 0.0000
-0.3010 CHIOU </s> -0.3010
-0.3010 DEEPALI DEGLURKAR 0.0000
-0.3010 DEGLURKAR </s> -0.3010
-0.3010 DEVULAPALLI </s> -0.3010
-0.3010 HENRY YEH 0.0000
-0.3010 JIMMY WANG 0.0000
-0.3010 JOON LEE 0.0000
-0.3010 KEVIN CHIOU 0.0000
-0.3010 LEE </s> -0.3010
-0.3010 RUCHI AGGARWAL 0.0000
-0.3010 SKALAHAS </s> -0.3010
-0.3010 SOUMYA SKALAHAS 0.0000
-0.3010 WANG </s> -0.3010
-0.3010 YE </s> -0.3010
-0.3010 YEH </s> -0.3010
-0.3010 YINGQING YE 0.0000
\3-grams:
-0.3010 <s> BIRAJA DEVULAPALLI
-0.3010 <s> DEEPALI DEGLURKAR
-0.3010 <s> HENRY YEH
-0.3010 <s> JIMMY WANG
-0.3010 <s> JOON LEE
-0.3010 <s> KEVIN CHIOU
-0.3010 <s> RUCHI AGGARWAL
-0.3010 <s> SOUMYA SKALAHAS
-0.3010 <s> YINGQING YE
-0.3010 BIRAJA DEVULAPALLI </s>
-0.3010 DEEPALI DEGLURKAR </s>
-0.3010 HENRY YEH </s>
-0.3010 JIMMY WANG </s>
-0.3010 JOON LEE </s>
-0.3010 KEVIN CHIOU </s>
-0.3010 RUCHI AGGARWAL </s>
-0.3010 SOUMYA SKALAHAS </s>
-0.3010 YINGQING YE </s>
\end\
S2CONTINUOUS=/usr/local/bin/pocketsphinx_continuous
HMM=//source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000/
LMFILE=/source/sphinxInfo/trainInfo/etc/7677.lm.DMP
DICT=/source/sphinxInfo/trainInfo/etc/7677.dic
$S2CONTINUOUS \
-fwdflat no -bestpath no \
-lm ${LMFILE} \
-dict ${DICT} \
-hmm ${HMM} \
-samprate 8000 \
-feat 1s_c_d_dd
Unfortunately, but I am still getting the same error:
INFO: s2_semi_mgau.c(1120): Reading S3 mixture gaussian file '//source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means'
FATAL_ERROR: "s2_semi_mgau.c", line 1150: //source/PocketSphinx/myTest1/model_parameters/myTest1.cd_cont_1000//means: #codebooks (360) != 1
--- (Edited on 2/6/2009 4:43 pm [GMT-0600] by jcwang) ---
I suppose it will be easier for both of us if you could just pack all your files in archive and upload them somewhere.
From a quick look: fwdflat and bestpath greatly improve accuracy, there is no need to disable them.
--- (Edited on 2/6/2009 5:09 pm [GMT-0600] by nsh) ---
--- (Edited on 2/6/2009 10:03 pm [GMT-0600] by jcwang) ---