VoxForge
Hi Peter,
after a bit of research, here is what I found:
Found an excellent overview of question creation for HTK on the ISLE (Illinois Speech and Language Engineering) site. From Lecture 6 - HMM Refinement (Speech Recognition Tools mini-course taught by Mark Hasegawa-Johnson):
[...] “Clustered triphones” are triphones that depend not on the phoneme labels of both neighbors, but instead, only on the class of the neighboring phones; for example, given that the neighboring phone is a vowel, /k/ might only be sensitive to whether it is a front vowel or a back vowel.
[HTK's] tree-based clustering algorithm accepts a long list of allowable phonetic class distinctions, or “questions” phrased in the following form:
QS "L_Nasal" { ng-*,n-*,m-* }
QS "R_Nasal" { *+ng,*+n,*+m }
The ?rst “question” speci?es that if the left phone is /m,n,ng/, then it is nasal, otherwise not. The second “question” speci?es a similar distinction for the right phone. Obviously, there are an enormous number of possible phonetic distinctions that one might ask about. The HHEd command TB examines the statistics of the training corpus, in order to determine which of these possible questions is most useful for each phoneme.
He then goes on to described the following steps in tree-based clustering (I am paraphrasing his description of the process ... see his original article for details):
Execute HERest to accumulate the statistics necessary for tree-based clustering. This creates inmmf/triphone.stats, and a new set of models in mmf/t2/triphones:
HERest -A -s mmf/triphone.stats -I mlf/triphones.mlf -B -C cfg/train.cfg -S scp/train.scp -H mmf/triphones -M mmf/t2 lists/triphones.txt;
Create the a new file called cluster.hhed (HHEd edit ?le). The ?rst line in this new file reads in the statistics:
RO 20.0 "mmf/k01i1/triphone.stats
Next you have many lines of questions:
QS "L_Fricative" { s-*,f-*,th-*,sh-*,z-*,v-*,dh-*,zh-* }
...
QS "R_Palatovelar" { *-sh,*-zh,*-y,*-k,*-g }
Next you have the particular states that you want to consider clustering. This uses a series of "TB commands" to specify:
For example, the following command tells HTK to consider how best to cluster the second states of all of the di?erent triphones based on the gender-dependent phone aa_m:
TB 100.0 "aa_mS2" {(aa_m,*-aa_m,aa_m+*,*-aa_m+*).state[2]}
The TB commands "grow the trees".
The AU command actually creates the new clustered-triphone acoustic models, by merging models from the input triphone list (in this cases lists/triphones.txt).
Finally, the ST command writes out the trees, and the CO command is used to write out a list of the new clustered-triphone models:
ST lists/triphone_trees.txt
CO lists/clustered.txt
This step implements the commands in your edit ?le, and to write out the resulting models to the ?le mmf/clustered:
HHEd -A -B -T 1 -H mmf/triphones -w mmf/clustered ed/cluster.hhed lists/triphones.txt
The Phonetik BAS (Bavarian Archive for Speech Signals) has a pronunciation lexicon called PHONOLEX. The web page has a link to Bas-Sampa - which includes phoneme groupings that might be useful for the creation of tree clusters. I think this might be useful for creating HTK "questions" for German.
Hope this helps,
Ken
--- (Edited on 8/16/2007 5:16 pm [GMT-0400] by kmaclean) ---
--- (Edited on 5/29/2015 3:25 pm [GMT-0400] by kmaclean) ---
--- (Edited on 8/20/2007 4:14 am [GMT-0500] by Visitor) ---