Acoustic Model Discussions

Flat
Regarding Speaker independent acoustic model to recognise proper nouns
User: bharathi.ravishanker
Date: 7/14/2010 4:25 pm
Views: 6741
Rating: 2

I am working on building a speaker independent acoustic system to recognise proper nouns using Julius. I downloaded the Speaker independent acoustic model in the voxforge site but since I have proper nouns that are not int the dict file, I used my dict file and the rest of the files that are present in the speaker independent acoustic model download. When creating the acoustic model everything works fine and no errors in any stage( buliding the acoutic model using scripts, using our acoustic model along with the the speaker independent acoustic model.) But when I run Julius, I get errors like
Error: voca_load_htkdict:triphone <some combination of triphone> not found. Apparently these bunch of triphones are not found in the tiedlist.

So if I have to use proper nouns, its not sufficient to add those words in the VoxforgeDict file and use the wrods.mlf file with the Speaker independent acoustic model? Do I have to build my own hmmdefs and tiedlist ? I want to so badly use the acoustic model available on Voxforge, since it is a huge collection of speech database.

Your help is greatly appreciated. Thanks a ton

--- (Edited on 7/14/2010 4:25 pm [GMT-0500] by bharathi.ravishanker) ---

Re: Regarding Speaker independent acoustic model to recognise proper nouns
User: kmaclean
Date: 7/15/2010 10:57 am
Views: 85
Rating: 3

>So if I have to use proper nouns, its not sufficient to add those words in

>the VoxforgeDict file

Correct.

In Step 10 of the Tutorial, tied-state triphones were generated using the VoxForge pronunciation dictionary - and your proper nouns were not included.  As a result, some triphones that you need in your proper nouns were not created.

>Do I have to build my own hmmdefs and tiedlist ?

Yes.

Or create a set of GPL (or public domain) recordings & transcriptions of the proper nouns (at least 5 samples each) you are looking for, and I can add them to the VoxForge corpus.  I will also need pronunciations for each of these proper nouns so that I can add them to the pronunciation dictionary.

Once that is done, the nightly acoustic model creation scripts will generate a new one with the required triphones.

Ken

--- (Edited on 7/15/2010 11:57 am [GMT-0400] by kmaclean) ---

Re: Regarding Speaker independent acoustic model to recognise proper nouns
User: bharathi.ravishanker
Date: 7/15/2010 1:07 pm
Views: 50
Rating: 1

Thanks Ken. These proper nouns need to be created by the user and should be added on the fly. I am creating a system where the users call people by call <name>I do not know what names the user is going to add- It can be anything!. Is nt there any way out apart from building it from scratch. How about monophones. Any way to merge existing acoustic monophones model to the monophones model of what the user is going to record? Thanks a  ton

--- (Edited on 7/15/2010 1:07 pm [GMT-0500] by bharathi.ravishanker) ---

Re: Regarding Speaker independent acoustic model to recognise proper nouns
User: kmaclean
Date: 7/15/2010 2:09 pm
Views: 99
Rating: 2

>Is nt there any way out apart from building it from scratch.

I don't think you can get away without trying to include as many of the different triphones your users might use...

For something a little quicker than a full build, you might try the following:

1. download the developer version of a current nightly build:

XXX_HTK_AcousticModel-2010-07-15_8kHz_16bit_MFCC_O_D_devel.tgz 15-Jul-2010 05:57 40.4M

2. Update the VoxForge pronunciation dictionary (voxforge_lexicon) used in step 10 of the VoxForge Tutorial with the proper nouns and pronunciations you think your user might use (use the phone book...).

3. And then perform Step 10 of the VoxForge Tutorial on the hmms in the hmm12 folder of the developer version of the nighlty build.

HTK will then try to map the unseen triphones used in proper nouns you want to the physical triphones that actually exist in the VoxForge acoustic model.

> How about monophones.

Give it a try and see what recognition rates you get... triphones usually give much better recognition.

>Any way to merge existing acoustic monophones model to the

>monophones model of what the user is going to record?

I am not sure what you are asking here... you can adapt an existing acoustic model (like the VoxForge acoustic model) to a person's voice... but that does not help you with out-of-vocabulary words.

G2P (like Sequitur G2P) can help you get approximate pronunciations of out-of-vocabulary words, but you still need some way to automate the mapping from the logical triphone to the physical triphone.

You might try the Julius or Sphinx forums and ask them how they might do something like this.

Ken

--- (Edited on 7/15/2010 3:09 pm [GMT-0400] by kmaclean) ---

Re: Regarding Speaker independent acoustic model to recognise proper nouns
User: bharathi.ravishanker
Date: 7/15/2010 6:04 pm
Views: 138
Rating: 2

Thanks Ken. This is what I am asking :-) I aleady have festival to converts words to phones.


So the words(propoer nouns) that I add should also be recorded right? Like for example I add the word 'Geetha' to the dictionary,  the word should be recorded as an audio file and converted to mfcc format and the other processing should be done right? Is it just ok to go from Step 10 since I remember reading that the words that we use in the vocabulary should be present as audio in the acoustic model. Thanks :)

--- (Edited on 7/15/2010 6:04 pm [GMT-0500] by bharathi.ravishanker) ---

Re: Regarding Speaker independent acoustic model to recognise proper nouns
User: kmaclean
Date: 7/18/2010 1:32 pm
Views: 97
Rating: 1

>So the words(propoer nouns) that I add should also be recorded right?

No

As described in my preceding post, you only need to add the new word ("Geetha" with a pronunciation of something like this: "g iy th ax") to the voxforge_lexicon file, before running HDMan in Step 10 as I discussed in my last post.

Theoretically, no recording should be required since Step 10 will try to map the triphones in the word "Geetha" to (i.e. the "logical" triphones - since they may not exist in the training data) actual "physical" triphones that were generated from the speech training data.  You will need to try this out to confirm this.

You could also get recordings for your proper nouns, and then adapt the VoxForge acoustic model with these recordings.

--- (Edited on 7/18/2010 2:32 pm [GMT-0400] by kmaclean) ---

Re: Regarding Speaker independent acoustic model to recognise proper nouns
User: bharathi.ravishanker
Date: 7/19/2010 12:47 pm
Views: 112
Rating: 2

Thanks Ken. I get this error message when I am trying to do the procedure with the new dictionary added in step10

 

$ HHEd -A -D -T 1 -H ./interim_files/hmm12/macros -H ./interim_files/hmm12/hmmd
efs -M ./interim_files/hmm13 ./interim_files/tree.hed ./interim_files/triphones
1 > logs/Step10_HHed_hmm13_log
  ERROR [+7251]  LoadStatsFile: unknown name b at line 4
 FATAL ERROR - Terminating program HHEd.

 

Thank

--- (Edited on 7/19/2010 12:47 pm [GMT-0500] by bharathi.ravishanker) ---

Re: Regarding Speaker independent acoustic model to recognise proper nouns
User: kmaclean
Date: 7/19/2010 2:10 pm
Views: 106
Rating: 2

>error message when I am trying to do the procedure with the new

>dictionary added in step10

Oops, I was wrong when I said you need to update the voxforge_lexicon in a previous post.  Unfortunately, the tutorial uses a slightly different set of phonemes than what is used in the VoxForge acoustic model. 

You need to use the pronunciation dictionnary that was used in the creation of the VoxForge acoustic models:

[   ] VoxForge.tgz            18-Oct-2009 01:40   1.3M  

and replace any of the input or interim files shown in Step 10 with the files in input_files or interim_files folders of the developer version of the nightly build.

It is probably easier to just try adapting the VoxForge acoustic model for your pusposes.

--- (Edited on 7/19/2010 3:10 pm [GMT-0400] by kmaclean) ---

Re: Regarding Speaker independent acoustic model to recognise proper nouns
User: bharathi.ravishanker
Date: 7/22/2010 12:09 pm
Views: 2651
Rating: 2

Ken,

I think I get the same error with replacing the dictionary specified by you

 

HHEd
 9382/9382 Models Loaded [5 states max, 1 mixes max]
RO 100.00 ''
 Setting outlier threshold for clustering
 RO->LS stats
  and loading state occupation stats
  ERROR [+7251]  LoadStatsFile: unknown name b at line 4
 FATAL ERROR - Terminating program HHEd.

I need some desperate help here. Thanks a lot

--- (Edited on 7/22/2010 12:09 pm [GMT-0500] by bharathi.ravishanker) ---

PreviousNext