VoxForge
Hello
I had speaker independent model created with HTK version 3.4
("hmmdefs" and "macros" files combined into single "models" file)
Then I created adapted model from it with HTK version 3.2.1 as you said in tutorial
http://www.voxforge.org/home/dev/acousticmodels/windows/adapt/htkjulius
Adaptation data was 30 long utterances (total 3 minutes audio) with same sampling and bitrate as speaker-indepent training data. I followed tutorial starting from step4, created adapted model, but problem is that results of Julius with speaker-independent and adapted models are almost the same
Speaker independent:
SENT: %Correct=0.00 [H=0, S=36, N=36]
WORD: %Corr=44.38, Acc=18.41 [H=229, D=19, S=268, I=134, N=516]
Adapted
SENT: %Correct=0.00 [H=0, S=36, N=36]
WORD: %Corr=44.77, Acc=22.67 [H=231, D=20, S=265, I=114, N=516]
Has anyone tried this tutorial? What results did you get? What can be the reason for this results?
Thanks
--- (Edited on 4/18/2009 5:05 am [GMT-0500] by Visitor) ---
Hi Rauf,
>I followed tutorial starting from step4, created adapted model, but
>problem is that results of Julius with speaker-independent and adapted
>models are almost the same
Did you review you HVite log you created in the forced alignment step in Step 4?
The test results for speaker independent model do not look very good... have you tried just creating an acoustic model with your 30 utterances for a comparison? Are you sure you have the correct sampling rate/bits per sample for your independent acoustic model?
Ken
--- (Edited on 4/20/2009 11:58 am [GMT-0400] by kmaclean) ---
Sorry, I didn't mention, it isn't telephone dialing application.
This is LVCSR task, with 64000 words dictionary, speaker-independent models were trained on 17h audio
I reviewed HVite_Log file, it's ok, bitrate and sampling rates are also ok.
Results for HTK LVCSR tool HDecode is 65% for speaker-independent models, when I also use adaptation files, almost in every utterance it warns that "no token survived at the end of sent"
Sure there's some problem with adaptation but I dont know what. This is my adaptation script
echo running HVite...
HVite -T 1 -l * -o SWT -C config -a -H xhmm5\models -i adaptPhones.mlf -m -t 1000.0 -I adaptWords.mlf -y lab -S adapt.scp dict_without_sp treeg.list > HViteLog
echo running HHEd...
HHEd -T 1 -H xhmm5\models -M classes regtree.hed treeg.list
echo running HERest 2 times...
HERest -T 1 -C config -C config.global -S adapt.scp -I adaptPhones.mlf -u a -J classes -K xforms mllr1 -H xhmm5/models treeg.list
HERest -T 1 -a -C config -C config.rc -S adapt.scp -I adaptPhones.mlf -u a -J xforms mllr1 -K xforms mllr2 -J classes -H xhmm5/models treeg.list
And this is how I invoke HDecode for adapted model
HDecode -a 2.0 -t 250.0 -v 100.0 -s 17.0 -p 0.0 -o SWT -C config.hdecode -i recout.mlf -w tg1_1 -J xforms mllr2 -J classes -H xhmm5/models dict_without_sp treeg.list -S test.scp
This is for speaker independent
HDecode -a 2.0 -t 250.0 -v 100.0 -s 17.0 -p 0.0 -C config.hdecode -i recout.mlf -w tg1_1 -H xhmm5/models dict_without_sp treeg.list -S test.scp
--- (Edited on 4/21/2009 1:07 am [GMT-0500] by Visitor) ---
Hi Rauf,
I have done some adaptation experiments of my own but only with HDecode, I have not tried Julius yet. I took the recordings from user "ralfherzog" (6620 sentences for adaptation, from training set, and 100 sentences for testing) and these are my results:
Without adpatation:
SENT: %Correct=63.64 [H=63, S=36, N=99]
WORD: %Corr=93.21, Acc=90.24 [H=783, D=2, S=55, I=25, N=840]
With adaptation:
SENT: %Correct=68.69 [H=68, S=31, N=99]
WORD: %Corr=95.12, Acc=92.26 [H=799, D=1, S=40, I=24, N=840]
So there is a visible improvement. Here are the commands used for the adaptation:
HHEd -H $hmmdir/$dir/macros -H $hmmdir/$dir/hmmdefs -M $hmmdir/classes scripts/regtree.hed tiedlist
HERest -C $config -C scripts/config.global -S $trainlist -I MLF/aligned.mlf -H $hmmdir/$dir/macros -u a -J $hmmdir/classes -K $hmmdir/xforms mllr1 -H $hmmdir/$dir/hmmdefs -h *\\%%%%*.mfc tiedlist
HERest -a -C $config -C scripts/config.rc -S $trainlist -I MLF/aligned.mlf -H $hmmdir/$dir/macros -u a -J $hmmdir/xforms mllr1 -J $hmmdir/classes -K $hmmdir/xforms mllr2 -H $hmmdir/$dir/hmmdefs -h *\\%%%%*.mfc tiedlist
and for testing:
Adapted:
hdecode -T 1 -m -H $hmmdir/$dir/hmmdefs -H $hmmdir/$dir/macros -J $hmmdir/xforms mllr2 -J $hmmdir/classes -h *\\%%%%*.mfc -C $hdecode_config -S $testlist -t 220.0 -l rec -w $language_model -p 0.0 -s 15.0 $hdecode_dict xwrdtiedlis
Speaker independent:
hdecode -H $hmmdir/$dir/hmmdefs -H $hmmdir/$dir/macros -C $hdecode_config -S $testlist -t 220.0 -l rec -w $language_model -p 0.0 -s 15.0 $hdecode_dict xwrdtiedlist
All configs scripts etc. were taken from the tutorial in HTKBook 3.4.1.
@Ken: What is the reason for reviewing of the HVite log in alignment? Is it that HTK may drop some sentences because of overpruning and not report any errors (unless you enable tracing)? That can be solved by turning off pruning (leave out the -t option). Or is there something I missed?
One more thing, the tutorial here is valid for older HTK versions only. HTK 3.4 does not have a separate tool (HEAdapt), everything is done by HERest.
--- (Edited on 21.04.2009 17:30 [GMT+0200] by tpavelka) ---
--- (Edited on 21.04.2009 17:35 [GMT+0200] by tpavelka) ---
Sorry, the second results were wrong (same as the first ones) the first time I posted it so it is wrong if you got it by email notification. Now it is corrected.
Also note the -m option for HDecode when running the adapted version. Without this option HDecode does not use any transforms even if you specify directories where to find them (the -J option).
--- (Edited on 21.04.2009 17:40 [GMT+0200] by tpavelka) ---
Thanks for reply!
Yes, your results are really good. Sorry I missed -m option in my previous post, I use it when running HDecode.
I don't know why it shows warnings "no token survived at the end of sentence"
When running speaker independent models with same pruning options it works ok
I tried -t 350 and -t 500, with same warning. Bigger values are very slow so I didn't test
--- (Edited on 4/22/2009 12:28 am [GMT-0500] by Visitor) ---
-t 500 is a pretty wide threshold, I use 220 and even that is awfully slow (~10xRT). The "no token survived at the end of sentence" warning can come from numerical problems (zero division etc...) or maybe a big mismatch between the acoustic model and the data. But that's only guessing. If you are working with VoxForge I can upload my acoustic models and the XForms for ralfherzog, if that can be any help.
--- (Edited on 22.04.2009 08:35 [GMT+0200] by tpavelka) ---
Ok, here is my whole "training recipe" including models (zip, 75MB). As these thing usually are it is a kind of a mess so if you have any questions, just post them here.
Tomas
--- (Edited on 22.04.2009 09:13 [GMT+0200] by tpavelka) ---