General Discussion

Flat
Re: svn checkout
User: kmaclean
Date: 4/10/2007 8:26 am
Views: 245
Rating: 12

Hi gongdusheng,

>If I can cook up a script to generate the acoustic and language models, I'll be sure to contribute it.  Perhaps you are willing to build Sphinx4 models in addition to Julius models?

Yes, Sphinx Models are on my to-do list, but I have not had a chance to set this up on VoxForge.  The plan is to have HTK/Julius, Sphinx and ISIP Acoustic Models generated daily (and when the corpus gets too big,  create them weekly).  If you can create some Sphinx4 scripts, that would definitely speed things up.

Can Sphinx4 Acoustic Models ('AM's) be used with Sphinx2 & 3?  I thought that Sphinx4 did not have its own native AM generator, and you needed to use the Sphinx 2 training scripts, and then convert them to Sphinx4 (or Sphinx 3) - is that still the case?

thanks,

Ken 

P.S. I'm on hold with 1&1 to find out why the VF Repository server is down - downloading audio should not have caused this problem ...

 

 

--- (Edited on 4/10/2007 9:26 am [GMT-0400] by kmaclean) ---

Re: svn checkout
User: kmaclean
Date: 4/10/2007 8:45 am
Views: 215
Rating: 14

>I'll try with the 1GB of data I just downloaded before trying to get the whole shebang. 

BTW, you should not have to download the whole corpus.  The audio in 

[DIR] Main/ 

is basically the 'normalized' audio data (all audio in the [DIR] Original/ directory, with different sampling rates and bits per sample, converted to standardized sampling/bit rates).  Just pick either the 16kHz_16bit directory, or the 8kHz_16bit directory, and download the contents,  and create your acoustic model from that.

Ken 

 

--- (Edited on 4/10/2007 9:45 am [GMT-0400] by kmaclean) ---

Re: svn checkout
User: gongdusheng
Date: 4/10/2007 9:00 pm
Views: 255
Rating: 3

Noted.  I was only downloading from the 16kHz_16bit.

I'm not a Sphinx expert.  In fact, I'm finding that the Sphinx documentation for creating models is seriously outdated.  From what I can gather, SphinxTrain is used to create Sphinx3 acoustic models.  The CMU tools are used to create the language model.  Then Sphinx4 requires the lot to be packaged into a JAR, which is done by their ant build script.  I'm not sure if the ant script does anything to change the models.

BTW, I'm planning on using just the audio transcripts to create the language model.  I think I saw somewhere in another forum post that this might not be the best way to create a language model, however.  Is it possible to create a language model based on data that is not in the audio?

-alex 

--- (Edited on 4/10/2007 9:00 pm [GMT-0500] by gongdusheng) ---

Re: svn checkout
User: kmaclean
Date: 4/10/2007 9:45 pm
Views: 246
Rating: 9

>  Is it possible to create a language model based on data that is not in the audio?

I don't know off hand - I have not played much with Language Model creation. 

I was hoping that Sanyaade would be able to clarify the process, but he has not had a chance to create his HTK LM training recipe.  His post seems to indicate that the language model must be based on the same dictionary used to train the Acoustic Models.  Not sure if this limitation would applies to Sphinx also.

Ken 

 

--- (Edited on 4/10/2007 10:45 pm [GMT-0400] by kmaclean) ---

Re: svn checkout
User: Tony Robinson
Date: 4/10/2007 10:17 pm
Views: 2817
Rating: 5

>  Is it possible to create a language model based on data that is not in the audio?

I don't know off hand - I have not played much with Language Model creation. 

Yes, it's perfectly possible and indeed normal practice in large vocabulary work to create a LM based on separate data from the audio.   A typical English broadcast news system might have ~100 hours of audio data with ~20k unique words for the acoustic model training and ~1m words of LM texts with ~60k unique words.

I was hoping that Sanyaade would be able to clarify the process, but he has not had a chance to create his HTK LM training recipe.  His post seems to indicate that the language model must be based on the same dictionary used to train the Acoustic Models.

You can certainly have two different dictionaries, one that covers all of the words in the training set and one that covers all of the words to be recognised.   Of course, this will lead to the use of some triphone contexts being used in recognition that you didn't see in training, but (a) you get this anyway with n-gram grammars and (b) that's the motivation behind the top-down clustering techniques included in packages such as HTK (ref section 10.5 of the HTK book).

Hope this helps,

 

Tony

-- 

Dr Tony Robinson, CEO Cantab Research Ltd
Phone:  +44 845 009 7530, Fax: +44 845 009 7532


--- (Edited on 11-April-2007 4:17 am [GMT+0100] by Tony Robinson) ---

PreviousNext