General Discussion

Flat
Building Mandarin acoustic models
User: voxUser123
Date: 11/4/2012 8:10 pm
Views: 8814
Rating: 8

Hello!

I would like to know if there is a standard recipe for creating acoustic models for Mandarin ASR. I am working on a simple command-control application and want to create phoneme models. I am assuming that about 1 hour of speech would be enough. Could you please let me know where may I find
1. List of phonemes in Mandarin
2. Dictionary for standard chinese word -> phoneme Or syllable -> phoneme format (on the lines of http://www.speech.cs.cmu.edu/cgi-bin/cmudict)
3. If possible, any open source audio d/b in Mandarin

Thanks a lot.
Regards,
Ethan 

 

--- (Edited on 11/4/2012 8:10 pm [GMT-0600] by voxUser123) ---

Re: Building Mandarin acoustic models
User: nsh
Date: 11/5/2012 8:08 am
Views: 184
Rating: 3

>I would like to know if there is a standard recipe for creating acoustic models for Mandarin ASR. 

The recipe for Mandarin is not different from the recipe for other applicatins, you can just use CMUSphinx acoustic model training tutorial to train a model in Mandarin

http://cmusphinx.sourceforge.net/wiki/tutorialam

Moreover, you can use existing Mandarin models, they should be pretty accurate for command-and-control

https://sourceforge.net/p/cmusphinx/code/11641/tree/trunk/pocketsphinx-extra/model/hmm/zh/

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin%20Broadcast%20News%20acoustic%20models/

> I am working on a simple command-control application and want to create phoneme models.

For command-and-control application it make sense to use word-dependent phones, not phoneme models

> List of phonemes in Mandarin

You can extract the list of phones from a dictionary.

>Dictionary for standard chinese word -> phoneme Or syllable -> phoneme format

You can find the dictionary here:

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin%20Language%20Model/

Essentially it's just a pinyin with or without tones.

> 3. If possible, any open source audio d/b in Mandarin
The easiest way to obtain a Mandarin speech database is to download voice recordings and segment them on sentences.

--- (Edited on 11/5/2012 17:08 [GMT+0300] by nsh) ---

Re: Building Mandarin acoustic models
User: voxUser123
Date: 11/5/2012 7:00 pm
Views: 294
Rating: 7

Thanks!

When I open the utf format dictionary (zh_broadcastnews_utf8.dic) in MS word (UTF8 format), I see the first many lines corresponding to garbage models (for ex. see first 3 lines below).

48              +GARBAGE+
4c49            +GARBAGE+
4c69            +GARBAGE+

and last line as below (with proper chinese characters)

k ao l ao

Is this OK? Is char. encoding correct for GARBAGE and Chinese models?

--- (Edited on 11/5/2012 7:00 pm [GMT-0600] by voxUser123) ---

Re: Building Mandarin acoustic models
User: nsh
Date: 11/6/2012 1:18 am
Views: 198
Rating: 7

> Is this OK?

Yes, what exactly confuses you?

 

--- (Edited on 11/6/2012 10:18 [GMT+0300] by nsh) ---

Re: Building Mandarin acoustic models
User: voxUser123
Date: 11/6/2012 2:13 am
Views: 189
Rating: 8

I mean why so many first lines just for +GARBAGE+ models? One is not enough? And why use string of numbers as "words" for garbage model?

--- (Edited on 11/6/2012 2:13 am [GMT-0600] by voxUser123) ---

Re: Building Mandarin acoustic models
User: nsh
Date: 11/6/2012 9:03 am
Views: 242
Rating: 3

> I mean why so many first lines just for +GARBAGE+ models?

There are many garbage words which look like numbers in the correspondng language model beause it was automatically colleted. Every word from a language model or grammar should be translated into phonetic sequencies somehow. So they are translated into garbage phone.

> One is not enough?

The words from the language model should be present in the dictionary. If your grammar doesn't use such words you can remove them.

> And why use string of numbers as "words" for garbage model?

The correspondene is left-to-right as in Western languages, not right-to-left. The garbage is used as transcription for the certain words, not words are used as garbage.

 

--- (Edited on 11/6/2012 18:03 [GMT+0300] by nsh) ---

Re: Building Mandarin acoustic models
User: voxUser123
Date: 11/6/2012 12:30 pm
Views: 2451
Rating: 5

I see.. Спасибо!

--- (Edited on 11/6/2012 12:30 pm [GMT-0600] by voxUser123) ---

PreviousNext