Acoustic Model Discussions

Flat
Creating Sphinx Acoustic Models
User: kmaclean
Date: 7/22/2008 4:32 pm
Views: 30686
Rating: 23

I had a someone email me and ask how to create Sphinx Acoustic models using the VoxForge Speech Corpus.  Based on the information from the CMU Robust Group Tutorial (Learning to use the CMU SPHINX Automatic Speech Recognition system), here is my reply:

I've downloaded and compiled Sphinxtrain, SphinxBase and PocketSphinx. 

To create acoustic models, I've run the SphinxTrain scripts (as described in the Robust tutorial) on the AN4 database using these commands:

  1. $perl scripts_pl/make_feats.pl  -ctl etc/an4_train.fileids # converts wav files to features
  2. perl scripts_pl/RunAll.pl # creates acoustic models
  3. perl scripts_pl/decode/slave.pl # run PocketSphinx speech rec engine agains the an4 test data

results were bad, but it is a small database...

You can set up an new environment to use the VoxForge corpus as follows:
  1. cd an4
  2. perl scripts_pl/copy_setup.pl -task VoxForge
This basically copies the an4 directory structure and contents to a new directory called VoxForge, and then you can change the required files to use the VoxForge Speech corpus.

Audio (referred to as Acoustic Signals in the Robust Tutorial)

Download the audio from the VoxForge Repository (8kHz-16bit) and put it in the VoxForge/wav directory.  You might want to use the wget utility to automate this.

Create a script to parse the VoxForge prompts file to create a new VoxForge_train.fileids file.  The prompts file is in this format:

jaiger-12032006-6/mfc/vf6-01 HE CRIED AND SWUNG THE CLUB WILDLY
jaiger-12032006-6/mfc/vf6-02 SHE TURNED FEARING THAT JACQUES MIGHT SEE WHAT WAS IN HER FACE
...

Which needs to be converted to this format:

jaiger-12032006-6/wav/vf6-01
jaiger-12032006-6/wav/vf6-01
...

Make sure the paths correspond to the paths listed in the VoxForge_train.fileids file.  Note, some files in the Repository are in FLAC format - these need to be converted to WAV (or omitted VoxForge_train.fileids file, for now...)

Next, convert the wav files to feature files using the make_feats.plscript:
$perl scripts_pl/make_feats.pl  -ctl etc/an4_train.fileids
This will take a long time to complete...

Transcriptions

Create a script to modify the VoxForge prompts file and copy them into the etc/VoxForge_train.transcription file so that they are in this format:

<s> HE CRIED AND SWUNG THE CLUB WILDLY </s> (jaiger-12032006-6/wav/vf6-01)
<s> SHE TURNED FEARING THAT JACQUES MIGHT SEE WHAT WAS IN HER FACE </s> (jaiger-12032006-6/wav/vf6-01)

Note: I am not sure if Sphinx accepts paths in the name... if not, you will have to rename all the audio files so that they are unique.

Phones

Copy the VoxForge Phone list into etc/VoxForge.phone.  Here is the VF phone list:

ax
sp
ae
b
l
ow
n
d
m
t
ey
iy
s
ix
k
sh
aa
z
er
eh
dx
ng
ay
ih
jh
ao
r
aw
ah
v
hh
p
uw
y
ch
w
f
th
g
uh
dh
oy
zh
sil

Note: you might not need the sp model - HTK does not in certain circumstances, I don't know about Sphinx.

Pronunciation Dictionary (language dictionary)

Create a script to modify the VoxForge lexicon (i.e. pronunciation dictionary), which is in this format:

A               [A]             ax
A'READY         [A'READY]       ax r eh d iy
A'S             [A'S]           ey z
A(2)            [A]          ey
...

Into this format (i.e. remove the return word in brackets):

A                            ax
A'READY               ax r eh d iy
A'S                        ey z
A(2)                     ey
....

and copy it to etc/VoxForge.dic.

HTK's HDMan command can do this too.

Filler (filler dictionary)

Just use the one that is already there (etc/VoxForge.filler).

Language Model

I am not sure how to create a language model - the Robust Tutorial says to "check the CMU SLM Toolkit page for an excellent manual".  With HTK/Julius (which I am more familiar with....), you don't need one if you are just creating grammar (for command and control applications - not dictation). 

You should be able to use Keith Vertanen's English Gigaword Language Model.  Copy it to etc/VoxForge.ug.lm.  You may need to to convert it to a dump file (etc/VoxForge.ug.lm.DMP) - there should be a tool (lm3g2dmp?) on the Sphinx site to do this.

Any feedback and/or corrections would be greatly appreciated,

thanks,

Ken

 

--- (Edited on 7/22/2008 5:32 pm [GMT-0400] by kmaclean) ---

Update: new tutorial: Training Acoustic Model For CMUSphinx

--- (Edited on 7/22/2010 11:55 pm [GMT-0400] by kmaclean) ---

Re: Creating Sphinx Acoustic Models
User: nsh
Date: 7/28/2008 6:27 pm
Views: 502
Rating: 18

A few comments

>  Note: I am not sure if Sphinx accepts paths in the name... if not, you will have to rename all the audio files so that they are unique.

 It's better to rename them

> Note: you might not need the sp model - HTK does not in certain circumstances, I don't know about Sphinx.

Unlike HTK Sphinx insert fillers automatically, so you don't need sp.

>  I am not sure how to create a language model - the Robust Tutorial says to "check the CMU SLM Toolkit page for an excellent manual".  With HTK/Julius (which I am more familiar with....), you don't need one if you are just creating grammar (for command and control applications - not dictation). 

You can use jsgf grammars. Something like:

 #JSGF V1.0;
/* JSGF Grammar for Turtle example */
grammar goforward;
public <move> = GO FORWARD TEN METERS;
public <move2> = GO <direction> <distance> [METER | METERS];
<direction> = FORWARD | BACKWARD;
<distance> = ONE | TWO | THREE | FOUR | FIVE | SIX | SEVEN | EIGHT | NINE | TEN;

 

--- (Edited on 7/28/2008 6:27 pm [GMT-0500] by nsh) ---

Re: Creating Sphinx Acoustic Models
User: bakuzen
Date: 1/22/2009 1:20 pm
Views: 301
Rating: 11

This may be a little late, but you can make LM's online here:

http://www.speech.cs.cmu.edu/tools/lmtool-adv.html


Just upload your transcript and it gives you a few files including the LM and sentences surrounded by start and finish tags.

 

--- (Edited on 1/22/2009 1:20 pm [GMT-0600] by bakuzen) ---

Re: Creating Sphinx Acoustic Models
User: prabhat
Date: 3/7/2009 10:48 am
Views: 186
Rating: 12

Hi Friend,

The given url is only for English language model. But for other language coded in ASCII using English alphabets, could you tell me one way to do that.

Thanks

Prabhatkr@G  (ie Gmail.com)

--- (Edited on 3/7/2009 10:48 am [GMT-0600] by Visitor) ---

Re: Creating Sphinx Acoustic Models
User: kmaclean
Date: 3/7/2009 12:57 pm
Views: 264
Rating: 12

Hi Prabhat,

>The given url is only for English language model. But for other language

>coded in ASCII using English alphabets, could you tell me one way to do

>that.

I am assuming you mean this URL:

http://www.speech.cs.cmu.edu/tools/lmtool-adv.html

Which allows you to create a language model (trigram) based on a text file you supply. Doing this creates three files:

  • Sentences
  • Dictionary
  • Language Model

I am a little confused by your question... are you looking for a way to create a pronunciation dictionary for languages other than English using English phonemes? or are you looking for a way to create a language model for another language?  If the latter, then these toolkits might help you out:

Ken

 

--- (Edited on 3/7/2009 1:57 pm [GMT-0500] by kmaclean) ---

Re: Creating Sphinx Acoustic Models
User: John
Date: 4/5/2009 9:34 am
Views: 258
Rating: 10

Hi,

I followed the instructions and made my data set for training. But when i finally run RunAll.pl, it aborts in the verfication phase saying that the date in "mydata.phone" file is not present in "mydata_train.transciption" file.

Do, I need to make the mydata.phone file myself. If yes, what may be the way to do it.

Any help is greatly appreciated :)

John

--- (Edited on 4/5/2009 9:34 am [GMT-0500] by Visitor) ---

Re: Creating Sphinx Acoustic Models
User: kmaclean
Date: 4/22/2009 1:47 pm
Views: 265
Rating: 11

Hi John,

>Do, I need to make the mydata.phone file myself. If yes, what may be the

>way to do it.

I don't think so... I don't use Sphinx very much... Regardless I don't recall having to do anything more that what was listed in the original post,

Ken

--- (Edited on 4/22/2009 2:47 pm [GMT-0400] by kmaclean) ---

Re: Creating Sphinx Acoustic Models
User: Moiz
Date: 8/17/2009 11:53 pm
Views: 163
Rating: 8

sir how to implement this accoustic model in sphinx4.

any steps or whatever please guide me

--- (Edited on 8/17/2009 11:53 pm [GMT-0500] by Visitor) ---

Re: Creating Sphinx Acoustic Models
User: kmaclean
Date: 8/18/2009 12:13 pm
Views: 6512
Rating: 11

Hi Moiz,

See this post: Sphinx3 model to Sphinx4 model

Ken

--- (Edited on 8/18/2009 1:13 pm [GMT-0400] by kmaclean) ---

Re: Creating Sphinx Acoustic Models
User: kmaclean
Date: 3/20/2011 8:53 pm
Views: 2269
Rating: 10

This paper provides a good description of how to compile an acoustic model for CMU Sphinx using the VoxForge speech corpus: Automatic Transcript Generator for Podcast Files (pdf)

 

--- (Edited on 3/20/2011 9:54 pm [GMT-0400] by kmaclean) ---

PreviousNext