VoxForge
Usually, the first step in building the Pronunciation Dictionnary is
to
create a sorted list of the words contained in your Grammar, one per
line, with pronunciations (the phonemes that make up a word). With our current example, it is easy
to create an initial one by
hand (see Initial Pronunciation Dictionnary).
However, for HTK to be able to compile your speech audio and transcriptions into an Acoustic Model, HTK requires a phonetically balanced Pronunciation Dictionnary with at the very least 30-40 'sentences' of 8-10 words each. If your Grammar has fewer sentences/words than this (as we do in this tutorial), or if your grammar in not phonetically balanced (if some phonemes only occur one or two times) then we need to add additional words to make sure we have 3-5 occurences of each phoneme in our Pronunciation Dictionnary.
Therefore for this tutorial, we will need to add additional
words to our Pronunciation Dictionnary in order to permit HTK to compile an Acoustic Model. Remember,
we are only trying to get the minimum number of pronunciation
dictionnary entries that will permit HTK to compile - creating an Acoustic Model
that produces consistent recognition results requires many more entries,
and corresponding speech audio.
To create a pronunciation dictionnary in HTK we will follow these steps:
First we need to create a prompts.txt file that includes our Grammar words and the additional dictionnary words required to create a phonetically balanced dictionnary. This file basically contains the list of words that need to be recorded, and the names of the audio files the recordings will be stored - one per line. You will do these recordings in Step 3.
Go to the 'voxforge/tutorial' folder you created in your home holder and create a file called 'prompts.txt' containing the following:
*/sample1 DIAL ONE TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE OH ZERO |
The
first
column of the prompts.txt file contains the name of the audio file to be
created, and the
following columns
contain the text transcriptions of what to be recorded in the audio
file.
The Julia script prompts2wlist.jl can take the prompts.txt file you just created, and remove the file name in the first column and print each word on one line into a word list file (wlist).
Download prompts2wlist.jl to your voxforge/bin folder.
Next, go back to your 'voxforge/tutorial' directory (where your prompts.txt file is located), and run prompts2wlist.jl as follows:
julia ../bin/prompts2wlist.jl prompts.txt wlist |
This will create the wlist file.
Note: the following entries were automatically added to your wlist file (in sorted order):
SENT-END |
These are HTK internal entries required for creation of the Acoustic
Model, and for processing of the Acoustic Model by Julius.
The next step is to add pronunciation information (i.e. the phonemes that make up the word) to each of the words in the wlist file, thus creating a Pronunciation Dictionnary. HTK uses the HDMan command to go through the wlist file, and look up the pronunciation for each word in a separate lexicon file, and output the result in a Pronunciation Dictionnary.
First you need to create the global.ded script in your 'voxforge/tutorial' folder (default script used by HDMan), which contains:
AS sp RS cmu MP sil sil sp |
This is mainly used to convert all the words in the dict file
to uppercase. See the HTK book for details of what these commands
mean.
Create a new directory called 'lexicon' in your 'voxforge' folder. Create a new file called voxforge_lexicon in your 'voxforge/lexicon' folder, and copy the into it: VoxForgeDict.txt (origin of VoxForge phoneset). Execute the HDMan command from your 'voxforge/tutorial' directory as follows:
HDMan -A -D -T 1 -m -w wlist -n monophones1 -i -l dlog dict ../lexicon/VoxForgeDict.txt |
The output of the above noted HDMan command is two files:
To help you determine your if dictionnary is phonetically balanced, review the output from your HDMan command in the 'dlog' log file:
WARNING: no script file ../lexicon/VoxForgeDict.ded Dictionary Usage Statistics --------------------------- Dictionary TotalWords WordsUsed TotalProns PronsUsed VoxForgeDict 268089 114 268089 114 dict 114 114 114 114 114 words required, 0 missing New Phone Usage Counts --------------------- 1. ae : 17 2. b : 32 3. ah : 74 4. l : 48 5. ow : 9 6. n : 43 7. sp : 112 8. d : 26 9. aa : 8 10. m : 13 11. z : 7 12. ih : 23 13. sh : 7 14. aw : 4 15. ng : 7 16. t : 32 17. k : 33 18. ch : 5 19. iy : 12 20. v : 8 21. w : 4 22. y : 8 23. uw : 7 24. p : 11 25. er : 13 26. eh : 24 27. r : 21 28. f : 5 29. g : 8 30. s : 15 31. th : 7 32. hh : 10 33. ey : 20 34. dh : 4 35. ao : 6 36. ay : 12 37. zh : 7 38. uh : 5 39. oy : 4 40. jh : 3 41. sil : 2 Dictionary dict created |
Although reviewing
this log will not
conclusively determine whether you have a phonetically balanced pronunciation dictionnary
or not (because it may be missing
certain phones altogether because your grammar is so small), it is
a good place to start.
For HTK to compile your Acoustic Model, you need to make sure that you have (at the very least) 3 to 5 usage counts for each
phone. If there
are phones that only have one occurence, you must add words that use
these phones to your prompts.txt file. You can search through the VoxForgeDict file for the phones you need, and then include the word that contains that phone.
You also need another monophones file for a later Step. Simply copy the "monophones1" file to a new "monophones0" file in your 'voxforge/tutorial' directory and then remove the short-pause "sp" entry in monophones0.