VoxForge
Hi all,
I've tried to use Keith Vertanen's AM and LM in Julius SR decoder. However I still haven't satistied with the result. I can say that the result were almost always wrong.
Here is what I already did. Please tell me your opinion and help me. I really want to make this work. :)
*) convert HTK's binary-hmmdefs --> HTK's ascii-hmmdefs:
cd wsj_all_10000_32
touch empty.hed
HHEd -H hmmdefs -w hmmdefs_ascii empty.hed tiedlist
*) convert ARPA N-gram --> binary N-gram (forward)
cd lm_giga
mkbingram -nlr lm_giga_64k_nvp_3gram.arpa lm_giga_64k_nvp_3gram_fw.bin
*) convert Hvite's dictionary --> Julius' dictionary:
cd lm_giga
cat lm_giga_64k_nvp.hvite.dic | gawk '{$1=sprintf("%s [%s]", $1, $1); print $0}' > 64k_nvp.julius.dic
*) tweak the Julius' dictionary:
*) create Julius configuration file (imitate fast.jconf from Julius' dictation-kit-v4.1)
## Language Model
-d lm_giga/lm_giga_64k_nvp_3gram_fw.bin
-v lm_giga/64k_nvp.julius.dic
## Acoustic Model
-h wsj_all_10000_32/hmmdefs_ascii
-hlist wsj_all_10000_32/tiedlist
-n 5
-output 1
-input mic
-zmeanframe
-rejectshort 800
#-demo
#-debug
*) run the Julius and test it using mp3s of "cmu_com_kal_ldom" I got from VoxForge
Btw, I got this error message: "Error: voca_malloc: maximum dict size exceeded limit (65535)." Is it related with the very poor performance?
Thanks.
-arie
--- (Edited on 3/17/2011 12:31 pm [GMT-0500] by noegroz1987) ---
--- (Edited on 3/17/2011 12:39 pm [GMT-0500] by noegroz1987) ---
>run the Julius and test it using mp3s of
>"cmu_com_kal_ldom" I got from VoxForge
I am confused,.. the VoxForge version of the cmu_com_kal_ldom onoy only seems to include wav data - if you are trying to recognize mp3 audio with an acoustic model trained with wav data, there will be some degradation in recognition rates.
Another problem may be that you are trying to do dictation recognition using Keith's AMs and LMs, and these may not be designed for this...
>Error: voca_malloc: maximum dict size exceeded limit(65535)
see Julius doc:
Size Limit
The recognition dictionary is limited to 65,535 words.
However, at configuration time if the "-enable-word-int" option is used the dictionary can be extended to
2^31 words. At present performance is not guaranteed when using this option.
--- (Edited on 3/17/2011 6:20 pm [GMT-0400] by kmaclean) ---
Hi Ken,
Thanks for the response.
I am confused,.. the VoxForge version of the cmu_com_kal_ldom onoy only seems to include wav data - if you are trying to recognize mp3 audio with an acoustic model trained with wav data, there will be some degradation in recognition rates.
I will also try to use the wav version. I'm still downloading the file now. Hope the result will be better.
Size Limit: The recognition dictionary is limited to 65,535 words. However, at configuration time if the "-enable-word-int" option is used the dictionary can be extended to 2^31 words.
Thanks for the clue. I missed it because I didn't expect I will found the info in 'libsent options' part (refer to Juliusbook v4.1.5).
Furthermore, do you think the Julius configuration I use was good enough? Any suggestion?
-arie
--- (Edited on 3/18/2011 9:27 am [GMT+0700] by noegroz1987) ---
However, at configuration time if the "-enable-word-int" option is used the dictionary can be extended to 2^31 words.
I tried to use that options when configuring Julius. But, when I tried to run Julius with above configuration, I got this error message:
Error: mymalloc_big: failed to allocate 1 x 4294907157 bytes
Any idea how to solve this problem?
Thanks.
-arie
--- (Edited on 3/18/2011 5:53 am [GMT-0500] by Visitor) ---
I am confused,.. the VoxForge version of the cmu_com_kal_ldom onoy only seems to include wav data - if you are trying to recognize mp3 audio with an acoustic model trained with wav data, there will be some degradation in recognition rates.
I ignored the vocabulary size limit problem and tried to test using wavs of "cmu_com_kal_ldom". Unfortunately, the recognition result was still very poor. For example:
Besides, sometimes I also got no result and a warning: "WARNING: 00 _default: hypothesis stack exhausted, terminate search now".
I checked the input by recording it and I think it's quite good. I checked the ARPA n-gram file and found n-grams that can construct the sentence "Please speak clearly and naturally". Any idea how to check whether the AM and LM are correct and can be used?
Thanks.
-arie
--- (Edited on 3/18/2011 11:12 pm [GMT+0700] by noegroz1987) ---
> Any idea how to check whether the AM and LM are correct
>and can be used?
For the AM, test it using a grammar. Unfortunately, I do not have much experience with LMs - I tried playing with Julius dictation a while back, but was not successful.... I assumed it was that the acoustic model I was using was not good enough (early VoxForge AM).
My understanding with respect to dictation is that it is very difficult to make it work for multiple speakers, but if you adapt a generic acoustic model to a particular user, then you may get better results.
--- (Edited on 3/18/2011 6:27 pm [GMT-0400] by kmaclean) ---
> but if you adapt a generic acoustic model to a particular user,
>then you may get better results
See this post on Nickolay's blog: How to create a speech recognition application for your needs - he talks about server-based, but a similar approach could be used for dictation on a single computer.
--- (Edited on 3/20/2011 9:09 pm [GMT-0400] by kmaclean) ---