VoxForge
A limitation of the creation of a triphone-based acoustic model in step 9 is that it does not deal with with triphones for which there are no examples in the training data. This problem can be avoided by careful design of the training database, but when building large vocabulary cross-word triphone systems unseen triphones are unavoidable.
Decision tree clustering used here allows previously unseen triphones to be synthesized. How? by using a phonetic decision tree where models are organized in a tree and the parameters you pass are called questions. The decoder asks a question on phone context and decides what model to use.
A phonetic decision tree is a binary tree in which a yes/no phonetic question is attached to each node. The question at each node is chosen to maximise the likelihood of the training data, given the final set of state tyings. Trees are defined by the command TB. All of the possible phonetic questions must be loaded into HHED using QS commands. Each question takes the form ``Is the left or right context in the named set?" where the context is the model context as defined by its logical name.
Create a new HTK script file called maketriphones.ded containing the following:
AS sp MP sil sil sp TC |
The TC edit command directs HDMan to output triphones.
Then execute the HDMan command against the entire lexicon file, not just the training dictionnary we have used thus far:
HDMan -A -D -T 1 -b sp -n fulllist0 -g maketriphones.ded -l flog dict-tri ../lexicon/VoxForgeDict.txt |
this creates 2 files:
Next, download the Julia script fixfulllist.jl to your 'voxforge/bin' folder and run it to append the contents of monophones0 to the beginning of to the fulllist0 file, and then to to remove any duplicate entries, and put the result in fulllist:
julia ../bin/fixfulllist.jl fulllist0 monophones0 fulllist |
Next you create a new HTK script called tree.hed (containing the phone context questions HTK will use to select the relevant triphones) in your 'voxforge/tutorial' folder containing the following: tree1.hed (Note: make sure you have a blank line at the end of this file). copy contents of tree1.hed to tree.hed:
cat tree1.hed > tree.hed |
Here is a short description of the commands contained in tree.hed:
(each QS command loads a single question and that question is defined by a set of contexts)
Refer to the HTK book for details on these commands.
Next download the mkclscript.jl script to your 'voxforge/bin' folder and run it as follows to append the state clusters to the tree.hed file you created above:
julia ../bin/mkclscript.jl monophones0 tree.hed |
Note: the mkclscript.jl script automatically adds the following to the end of your tree.hed file:
TR 1 AU "fulllist" CO "tiedlist" ST "trees" |
Your file should look like this: tree.hed
Next create 3 more folders: hmm13-15
Then execute the HHEd (hmm definition editor) command:
HHEd -A -D -T 1 -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1 |
This command creates 3 files:
Next run HERest 2 more times:
HERest -A -D -T 1 -T 1 -C config -I wintri.mlf
-t 250.0 150.0 3000.0 -S train.scp -H hmm13/macros -H hmm13/hmmdefs -M
hmm14 tiedlist |
This command creates 2 files:
HERest -A -D -T 1 -T 1 -C config -I wintri.mlf
-t 250.0 150.0 3000.0 -S train.scp -H hmm14/macros -H hmm14/hmmdefs -M
hmm15 tiedlist |
This creates 2 files:
The hmmdefs file in the hmm15 folder, along with the tiedlist file, can now be used with Julius to recognize your speech!