VoxForge
In the last step there's the following instruction:
Then execute the HDMan command against the entire lexicon file, not just the training dictionnary we have used thus far (that is why we needed to create a phonetically balanced grammar in Step 2):
$HDMan -A -D -T 1 -b sp -n fulllist -g global.ded -l flog dict-tri ../lexicon/voxforge_lexicon
But this introduce some phonemes that are only in the voxforge_lexicon, not in our dict, so there are no audio file for that.
This situation is the cause of an error when do:
$HHEd -A -D -T 1 -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1
ERROR [+2662] FindProtoModel: no proto for p in hSet
"p" is only an example of phonemes without audio registration.
I've try to use dict instead of ../lexicon/voxforge_lexicon, and it work correctly.
I'm wrong?
Thanks
Manuel
Hi Manuel,
The VoxForge Lexicon ( VoxForge.tgz 01-Sep-2007 16:48 2.6M )
is based on the CMU dictionary ( cmu.tgz 29-Apr-2007 23:53 3.1M).
Whereas the How-to and Tutorial uses the smaller Switchboard dictionary ( ISIP.tgz 29-Apr-2007 23:53 296k) which has slightly different pronunciations than CMU's dictionary.
I have not had the chance to go back to change the how-to and tutorials to use the update VoxForge Lexicon.
Ken
I'm still a bit confused about this problem!
So, does it mean that if we go back to step 2 when we first used the lexicon file, and we replace the one used there with the one contained in VoxForge.tgz the HHEd command in Step 10 works?
Thank
>does it mean that if we go back to step 2 when we first used the lexicon file,
>and we replace the one used there with the one contained in VoxForge.tgz
>the HHEd command in Step 10 works?
No. The tutorial works. Problems occur when you change the phone set used to train the acoustic model.
A phone set is the list of phonemes in your pronunciation dictionary (also called lexicon). The Switchboard dictionary (used in the tutorial) uses a different phone set from the one used in VoxForge lexicon (which is used in training the acoustic models in the nightly builds and quickstart).
This is OK for a tutorial, but if you want to use a different pronunciation dictionary with more words (and which may or may not use a different phone set for its pronunciations) then you need to take that into acount in the earlier tutorial steps.
Thanks for the answer Ken!
My problem was that i was trying to run the tutorial on one of the corpuses available on Voxforge, and there were 2 phonemes less in the phonemes list.
So, if i understood correctly i have two chances in this case:
Thank you again ;)
Giuliano
>we replace the one used there with the one contained
>inVoxForge.tgz the HHEd command in Step 10 works?
Sorry, I should have been clearer - this will work too.
>Take another corpus that includes all the phonemes present in the lexicon.
Corpus is important, but not the main issue here - I think the issue is the phoneset used by a given pronunciation dictionary. The speech recognition engine does not care what you call you phones, it just tries to apply them consistently.
So for the same corpus, you could train acoustic models using different pronunciation dictionaries, and create very smilar acoustic models, but different enough that one will not work with another pronunciation dictionary.