VoxForge
I am working on building a speaker independent acoustic
system to recognise proper nouns using Julius. I downloaded the Speaker
independent acoustic model in the voxforge site but since I have proper
nouns that are not int the dict file, I used my dict file and the rest
of the files that are present in the speaker independent acoustic model
download. When creating the acoustic model everything works fine and no
errors in any stage( buliding the acoutic model using scripts, using our
acoustic model along with the the speaker independent acoustic model.)
But when I run Julius, I get errors like
Error: voca_load_htkdict:triphone <some combination of triphone>
not found. Apparently these bunch of triphones are not found in the
tiedlist.
So if I have to use proper nouns, its not sufficient to
add those words in the VoxforgeDict file and use the wrods.mlf file
with the Speaker independent acoustic model? Do I have to build my own
hmmdefs and tiedlist ? I want to so badly use the acoustic model
available on Voxforge, since it is a huge collection of speech database.
Your help is greatly appreciated. Thanks a ton
--- (Edited on 7/14/2010 4:25 pm [GMT-0500] by bharathi.ravishanker) ---
>So if I have to use proper nouns, its not sufficient to add those words in
>the VoxforgeDict file
Correct.
In Step 10 of the Tutorial, tied-state triphones were generated using the VoxForge pronunciation dictionary - and your proper nouns were not included. As a result, some triphones that you need in your proper nouns were not created.
>Do I have to build my own hmmdefs and tiedlist ?
Yes.
Or create a set of GPL (or public domain) recordings & transcriptions of the proper nouns (at least 5 samples each) you are looking for, and I can add them to the VoxForge corpus. I will also need pronunciations for each of these proper nouns so that I can add them to the pronunciation dictionary.
Once that is done, the nightly acoustic model creation scripts will generate a new one with the required triphones.
Ken
--- (Edited on 7/15/2010 11:57 am [GMT-0400] by kmaclean) ---
Thanks Ken. These proper nouns need to be created by the user and should be added on the fly. I am creating a system where the users call people by call <name>I do not know what names the user is going to add- It can be anything!. Is nt there any way out apart from building it from scratch. How about monophones. Any way to merge existing acoustic monophones model to the monophones model of what the user is going to record? Thanks a ton
--- (Edited on 7/15/2010 1:07 pm [GMT-0500] by bharathi.ravishanker) ---
>Is nt there any way out apart from building it from scratch.
I don't think you can get away without trying to include as many of the different triphones your users might use...
For something a little quicker than a full build, you might try the following:
1. download the developer version of a current nightly build:
XXX_HTK_AcousticModel-2010-07-15_8kHz_16bit_MFCC_O_D_devel.tgz 15-Jul-2010 05:57 40.4M
2. Update the VoxForge pronunciation dictionary (voxforge_lexicon) used in step 10 of the VoxForge Tutorial with the proper nouns and pronunciations you think your user might use (use the phone book...).
3. And then perform Step 10 of the VoxForge Tutorial on the hmms in the hmm12 folder of the developer version of the nighlty build.
HTK will then try to map the unseen triphones used in proper nouns you want to the physical triphones that actually exist in the VoxForge acoustic model.
> How about monophones.
Give it a try and see what recognition rates you get... triphones usually give much better recognition.
>Any way to merge existing acoustic monophones model to the
>monophones model of what the user is going to record?
I am not sure what you are asking here... you can adapt an existing acoustic model (like the VoxForge acoustic model) to a person's voice... but that does not help you with out-of-vocabulary words.
G2P (like Sequitur G2P) can help you get approximate pronunciations of out-of-vocabulary words, but you still need some way to automate the mapping from the logical triphone to the physical triphone.
You might try the Julius or Sphinx forums and ask them how they might do something like this.
Ken
--- (Edited on 7/15/2010 3:09 pm [GMT-0400] by kmaclean) ---
Thanks Ken. This is what I am asking :-) I aleady have festival to converts words to phones.
So the words(propoer nouns) that I add should also be recorded right? Like for example I add the word 'Geetha' to the dictionary, the word should be recorded as an audio file and converted to mfcc format and the other processing should be done right? Is it just ok to go from Step 10 since I remember reading that the words that we use in the vocabulary should be present as audio in the acoustic model. Thanks :)
--- (Edited on 7/15/2010 6:04 pm [GMT-0500] by bharathi.ravishanker) ---
>So the words(propoer nouns) that I add should also be recorded right?
No
As described in my preceding post, you only need to add the new word ("Geetha" with a pronunciation of something like this: "g iy th ax") to the voxforge_lexicon file, before running HDMan in Step 10 as I discussed in my last post.
Theoretically, no recording should be required since Step 10 will try to map the triphones in the word "Geetha" to (i.e. the "logical" triphones - since they may not exist in the training data) actual "physical" triphones that were generated from the speech training data. You will need to try this out to confirm this.
You could also get recordings for your proper nouns, and then adapt the VoxForge acoustic model with these recordings.
--- (Edited on 7/18/2010 2:32 pm [GMT-0400] by kmaclean) ---
Thanks Ken. I get this error message when I am trying to do the procedure with the new dictionary added in step10
$ HHEd -A -D -T 1 -H ./interim_files/hmm12/macros -H ./interim_files/hmm12/hmmd
efs -M ./interim_files/hmm13 ./interim_files/tree.hed ./interim_files/triphones
1 > logs/Step10_HHed_hmm13_log
ERROR [+7251] LoadStatsFile: unknown name b at line 4
FATAL ERROR - Terminating program HHEd.
Thank
--- (Edited on 7/19/2010 12:47 pm [GMT-0500] by bharathi.ravishanker) ---
>error message when I am trying to do the procedure with the new
>dictionary added in step10
Oops, I was wrong when I said you need to update the voxforge_lexicon in a previous post. Unfortunately, the tutorial uses a slightly different set of phonemes than what is used in the VoxForge acoustic model.
You need to use the pronunciation dictionnary that was used in the creation of the VoxForge acoustic models:
VoxForge.tgz 18-Oct-2009 01:40 1.3M
and replace any of the input or interim files shown in Step 10 with the files in input_files or interim_files folders of the developer version of the nightly build.
It is probably easier to just try adapting the VoxForge acoustic model for your pusposes.
--- (Edited on 7/19/2010 3:10 pm [GMT-0400] by kmaclean) ---
Ken,
I think I get the same error with replacing the dictionary specified by you
HHEd
9382/9382 Models Loaded [5 states max, 1 mixes max]
RO 100.00 ''
Setting outlier threshold for clustering
RO->LS stats
and loading state occupation stats
ERROR [+7251] LoadStatsFile: unknown name b at line 4
FATAL ERROR - Terminating program HHEd.
I need some desperate help here. Thanks a lot
--- (Edited on 7/22/2010 12:09 pm [GMT-0500] by bharathi.ravishanker) ---