Acoustic Model Discussions

Flat
Forced alignment did not change the transcription at all
User: weedwind
Date: 5/20/2014 12:42 am
Views: 2380
Rating: 1

Hi, All,


I am trying to find the best phoneme pronunciation for a big database. I have all the word level transcriptions and a dictionary. I first used HLed to create the initial phoneme level transcription. In this processing, only the first pronunciation of each word is used by HLed. Then, I did a flat start initialization of all the phone models, then used HERest to do the embedded training. I started from 1 mixture models, and gradually splitted the models into 2, 4, 6, 8, 12, 16 mixtures. In this whole processing, the initial phoneme transcription was used. After 16 mixture models were created, I began forced alignment using HVite. I only needed to find the best pronunciations, not the time stamps. But it looks strange to me that the output phoneme transcription is exactly the same as the initial transcription. This was found by HResults.

I tried different mixture models. I tried 2 mixture models, and 4 mixture models, but still the same. I do not even know is this a good thing or bad thing and any possible reasons. I noticed that in my dictionary, though there were many words with multiple pronunciations, those pronunciations did not differ much. Maybe that's the reason???  I don't know if it means the initial transcriptions are already good , and the models are already well trained, or it means the models are bad...

 

Can anybody give me any possible reasons?

 

 

Mike

--- (Edited on 5/20/2014 12:42 am [GMT-0500] by weedwind) ---

PreviousNext