VoxForge
Hi all,
I'm not pro with HTK but I've built a speech recogniser reading the HTK book and following what was written in the voxforge site. I trained this speech recogniser with 1132 sentences from the cmu arctic database and tetsted it with 50 sentences from the same corpus. I've defined a grammar with one variable $word representing all the words in the 50 test sentences as follows:
(SENT-START (<$word>) SENT-END)
The problem is that when I evaluated it with HVite the % correct was less than ~60% (and much lower when the option -l was not used with HParse)
Should I use some language model to obtain a better wordnetwork? How can I do that?
Another thing, in the chapter 15 of the HTK book, there's a description of how to create a language model. How can I use a such model to have a word network like that generated with HParse?
I need your help (with details please)
--- (Edited on 11/28/2010 6:02 pm [GMT+0100] by moughr) ---
> tetsted it with 50 sentences from the same corpus.
YOu should be getting better recognition rates with 50 test sentences from the same corpus - have you tried tweeking your HVIte word insertion penalty and/or grammar scale factor?
See section 3.4.1 of the HTK book (Step 12):
The options -p and -s set the word insertion penalty and the grammar scale factor, respectively. The word insertion penalty is a fixed value added to each token when it transits from the end of one word to the start of the next. The grammar scale factor is the amount by which the language model probability is scaled before being added to each token as it transits from the end of one word to the start of the next. These parameters can have a significant effect on recognition performance and hence, some tuning on development test data is well worthwhile.
--- (Edited on 3/21/2011 12:26 am [GMT-0400] by kmaclean) ---