VoxForge
In the last step you created HMM models that did not
include an "sp" (short pause) silence model - which refers to the types
of short pauses that occur between words in normal speech.
However, you did create a "sil" silence model - sil silence models are
typically of longer duration, and refer to the pauses occur at the end
of a sentence.
The HTK book says that the sp model needs to have its "emitting state tied to the centre state of the silence model". What this means is that you need to create a new sp model in your hmmdefs, that it will use the centre state of sil, and then they both need to be 'tied' together (for a bit of background on HMMs and states).
This can be done by copying the centre state from the sil model in your hmmdefs file and adding it to the sp model, and then running a special tool called HHED to 'tie' the sp model to the sil model so that they share the same centre state. The HTK book provides some background on what this means, but you need an understanding of the basics of Hidden Markov Modelling before tackling the HTK Book explanations (the University of Leeds HMM tutorial provides a very good tutorial on Hidden Markov Modelling).
Note: you do not need to understand HMMs to complete this tutorial. |
First copy the contents of the hmm3 folder to hmm4. Then using an editor, create new "sp" model in hmm4/hmmdefs as follows:
0.0 1.0 0.0 0.0 0.9 0.1 0.0 0.0 0.0 |
Your sp model should look something like this:
~h "sp" <BEGINHMM> <NUMSTATES> 3 <STATE> 2 <MEAN> 25 -7.118713e+00 -3.081000e-01 -1.749457e+00 -1.031217e+00 -1.165559e+00 3.531405e+00 3.928634e+00 1.414117e+00 4.853880e+00 3.436303e+00 8.821907e-01 3.578307e+00 -7.581001e-02 -5.415154e-02 2.016401e-01 -8.739231e-02 4.121462e-02 8.872373e-02 -7.476506e-02 9.644009e-03 1.259353e-01 -1.379244e-01 -3.326035e-02 1.257626e-01 1.159135e-01 <VARIANCE> 25 8.348813e+00 8.595272e+00 1.137517e+01 1.209942e+01 8.636264e+00 1.499810e+01 9.930653e+00 1.164548e+01 1.039156e+01 8.853771e+00 8.320317e+00 9.811087e+00 9.367380e-01 7.999393e-01 1.583295e+00 9.593871e-01 7.300864e-01 1.651675e+00 1.296489e+00 9.965155e-01 1.218978e+00 1.052432e+00 9.433080e-01 1.521487e+00 5.511271e-01 <GCONST> 7.427084e+01 <TRANSP> 3 0.0 1.0 0.0 0.0 0.9 0.1 0.0 0.0 0.0 <ENDHMM> |
Your files should look like this:
Next, run the HMM editor called HHEd to "tie" the sp state to the sil centre state - tying means that one or more HMMs share the same set of parameters. To do this you need to create the following HHEd command script, called sil.hed, in your voxforge/tutorial folder:
AT 2 4 0.2 {sil.transP} AT 4 2 0.2 {sil.transP} AT 1 3 0.3 {sp.transP} TI silst {sil.state[3],sp.state[2]} |
The last line is the "tie" command. Next run HHEd as follows, but using the monophones1 file which contains the sp model:
HHEd -A -D -T 1 -H hmm4/macros -H hmm4/hmmdefs -M hmm5 sil.hed monophones1 |
The files created by this command are:
Next run HERest 2 more times, this time using the monophones1 file:
HERest -A -D -T 1 -C config -I phones1.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm5/macros -H hmm5/hmmdefs -M hmm6 monophones1 |
The files created by this command are:
HERest -A -D -T 1 -C config -I phones1.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm6/macros -H hmm6/hmmdefs -M hmm7 monophones1 |
The files created by this command are: