VoxForge
I had a someone email me and ask how to create Sphinx Acoustic models using the VoxForge Speech Corpus. Based on the information from the CMU Robust Group Tutorial (Learning to use the CMU SPHINX Automatic Speech Recognition system), here is my reply:
I've downloaded and compiled Sphinxtrain, SphinxBase and PocketSphinx.
To create acoustic models, I've run the SphinxTrain scripts (as described in the Robust tutorial) on the AN4 database using these commands:
results were bad, but it is a small database...
$perl scripts_pl/make_feats.pl -ctl etc/an4_train.fileids # converts wav files to features perl scripts_pl/RunAll.pl # creates acoustic models perl scripts_pl/decode/slave.pl # run PocketSphinx speech rec engine agains the an4 test data
You can set up an new environment to use the VoxForge corpus as follows:This basically copies the an4 directory structure and contents to a new directory called VoxForge, and then you can change the required files to use the VoxForge Speech corpus.
cd an4 perl scripts_pl/copy_setup.pl -task VoxForge
Audio (referred to as Acoustic Signals in the Robust Tutorial)
Download the audio from the VoxForge Repository (8kHz-16bit) and put it in the VoxForge/wav directory. You might want to use the wget utility to automate this.
Create a script to parse the VoxForge prompts file to create a new VoxForge_train.fileids file. The prompts file is in this format:
jaiger-12032006-6/mfc/vf6-01 HE CRIED AND SWUNG THE CLUB WILDLY
jaiger-12032006-6/mfc/vf6-02 SHE TURNED FEARING THAT JACQUES MIGHT SEE WHAT WAS IN HER FACE
...
Which needs to be converted to this format:
jaiger-12032006-6/wav/vf6-01Make sure the paths correspond to the paths listed in the VoxForge_train.fileids file. Note, some files in the Repository are in FLAC format - these need to be converted to WAV (or omitted VoxForge_train.fileids file, for now...)
jaiger-12032006-6/wav/vf6-01
...
Next, convert the wav files to feature files using the make_feats.plscript:
This will take a long time to complete...$perl scripts_pl/make_feats.pl -ctl etc/an4_train.fileids
Transcriptions
Create a script to modify the VoxForge prompts file and copy them into the etc/VoxForge_train.transcription file so that they are in this format:
<s> HE CRIED AND SWUNG THE CLUB WILDLY </s> (jaiger-12032006-6/wav/vf6-01)<s> SHE TURNED FEARING THAT JACQUES MIGHT SEE WHAT WAS IN HER FACE </s> (jaiger-12032006-6/wav/vf6-01)Note: I am not sure if Sphinx accepts paths in the name... if not, you will have to rename all the audio files so that they are unique.
Phones
Copy the VoxForge Phone list into etc/VoxForge.phone. Here is the VF phone list:
axNote: you might not need the sp model - HTK does not in certain circumstances, I don't know about Sphinx.
sp
ae
b
l
ow
n
d
m
t
ey
iy
s
ix
k
sh
aa
z
er
eh
dx
ng
ay
ih
jh
ao
r
aw
ah
v
hh
p
uw
y
ch
w
f
th
g
uh
dh
oy
zh
silPronunciation Dictionary (language dictionary)
Create a script to modify the VoxForge lexicon (i.e. pronunciation dictionary), which is in this format:
Into this format (i.e. remove the return word in brackets):
A [A] ax
A'READY [A'READY] ax r eh d iy
A'S [A'S] ey z
A(2) [A] ey
...
A ax
A'READY ax r eh d iy
A'S ey z
A(2) ey
....
and copy it to etc/VoxForge.dic.
HTK's HDMan command can do this too.
Filler (filler dictionary)
Just use the one that is already there (etc/VoxForge.filler).
Language Model
I am not sure how to create a language model - the Robust Tutorial says to "check the CMU SLM Toolkit page for an excellent manual". With HTK/Julius (which I am more familiar with....), you don't need one if you are just creating grammar (for command and control applications - not dictation).
You should be able to use Keith Vertanen's English Gigaword Language Model. Copy it to etc/VoxForge.ug.lm. You may need to to convert it to a dump file (etc/VoxForge.ug.lm.DMP) - there should be a tool (lm3g2dmp?) on the Sphinx site to do this.
Any feedback and/or corrections would be greatly appreciated,
thanks,
Ken
--- (Edited on 7/22/2008 5:32 pm [GMT-0400] by kmaclean) ---
Update: new tutorial: Training Acoustic Model For CMUSphinx
--- (Edited on 7/22/2010 11:55 pm [GMT-0400] by kmaclean) ---
A few comments
> Note: I am not sure if Sphinx accepts paths in the name... if not, you will have to rename all the audio files so that they are unique.
It's better to rename them
> Note: you might not need the sp model - HTK does not in certain circumstances, I don't know about Sphinx.
Unlike HTK Sphinx insert fillers automatically, so you don't need sp.
> I am not sure how to create a language model - the Robust Tutorial says to "check the CMU SLM Toolkit page for an excellent manual". With HTK/Julius (which I am more familiar with....), you don't need one if you are just creating grammar (for command and control applications - not dictation).
You can use jsgf grammars. Something like:
#JSGF V1.0;
/* JSGF Grammar for Turtle example */
grammar goforward;
public <move> = GO FORWARD TEN METERS;
public <move2> = GO <direction> <distance> [METER | METERS];
<direction> = FORWARD | BACKWARD;
<distance> = ONE | TWO | THREE | FOUR | FIVE | SIX | SEVEN | EIGHT | NINE | TEN;
--- (Edited on 7/28/2008 6:27 pm [GMT-0500] by nsh) ---
This may be a little late, but you can make LM's online here:
http://www.speech.cs.cmu.edu/tools/lmtool-adv.html
Just upload your transcript and it gives you a few files including the LM and sentences surrounded by start and finish tags.
--- (Edited on 1/22/2009 1:20 pm [GMT-0600] by bakuzen) ---
Hi Friend,
The given url is only for English language model. But for other language coded in ASCII using English alphabets, could you tell me one way to do that.
Thanks
Prabhatkr@G (ie Gmail.com)
--- (Edited on 3/7/2009 10:48 am [GMT-0600] by Visitor) ---
Hi Prabhat,
>The given url is only for English language model. But for other language
>coded in ASCII using English alphabets, could you tell me one way to do
>that.
I am assuming you mean this URL:
http://www.speech.cs.cmu.edu/tools/lmtool-adv.html
Which allows you to create a language model (trigram) based on a text file you supply. Doing this creates three files:
I am a little confused by your question... are you looking for a way to create a pronunciation dictionary for languages other than English using English phonemes? or are you looking for a way to create a language model for another language? If the latter, then these toolkits might help you out:
Ken
--- (Edited on 3/7/2009 1:57 pm [GMT-0500] by kmaclean) ---
Hi,
I followed the instructions and made my data set for training. But when i finally run RunAll.pl, it aborts in the verfication phase saying that the date in "mydata.phone" file is not present in "mydata_train.transciption" file.
Do, I need to make the mydata.phone file myself. If yes, what may be the way to do it.
Any help is greatly appreciated :)
John
--- (Edited on 4/5/2009 9:34 am [GMT-0500] by Visitor) ---
Hi John,
>Do, I need to make the mydata.phone file myself. If yes, what may be the
>way to do it.
I don't think so... I don't use Sphinx very much... Regardless I don't recall having to do anything more that what was listed in the original post,
Ken
--- (Edited on 4/22/2009 2:47 pm [GMT-0400] by kmaclean) ---
Hi Moiz,
See this post: Sphinx3 model to Sphinx4 model
Ken
--- (Edited on 8/18/2009 1:13 pm [GMT-0400] by kmaclean) ---
This paper provides a good description of how to compile an acoustic model for CMU Sphinx using the VoxForge speech corpus: Automatic Transcript Generator for Podcast Files (pdf)
--- (Edited on 3/20/2011 9:54 pm [GMT-0400] by kmaclean) ---