VoxForge
This operation is similar to the HLEd word-to-phone mapping operation performed in the Step 4, however in this case the HVite command can consider all pronunciations for each word (in the case where a word has more than one pronunciations), and then output the pronunciation that best matches the acoustic data.
Execute the HVite command as follows:
Linux:
HVite -A -D -T 1 -l '*' -o SWT -b SENT-END -C
config -H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0 150.0
1000.0 -y lab -a -I words.mlf -S train.scp dict monophones1> HVite_log |
Windows
HVite -A -D -T 1 -l * -o SWT -b SENT-END -C
config -H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0 150.0
1000.0 -y lab -a -I words.mlf -S train.scp dict monophones1> HVite_log |
This creates the aligned.mlf
file.
Review the output of the HVite command very carefully.
Catching errors here will save a lot of headache later on.
Because seemingly minor problems at this step sometimes show up as
major errors at later steps, and they are very difficult to trace back
to here. Here is the log output from the above noted command: hvite_log.
It is time well spent to review the log to make sure that HVite
recognized all the words for each line in your prompts file.
Next run HERest 2 more times:
HERest -A -D -T 1 -C config -I aligned.mlf -t 250.0
150.0 3000.0 -S train.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8
monophones1 |
The files created by this command are:
HERest -A -D -T 1 -C config -I aligned.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm8/macros -H hmm8/hmmdefs -M hmm9 monophones1 |
The files created by this command are:
Note: the monophone models created in hmm9 could actually be used with
Julius for speech recognition, but recognition accuracy can be greatly improved
by using Tied-State triphones - see next sections.