VoxForge
Hi, All,
I am trying to find the best phoneme pronunciation for a big
database. I have all the word level transcriptions and a dictionary. I
first used HLed to create the initial phoneme level transcription. In
this processing, only the first pronunciation of each word is used by
HLed. Then, I did a flat start initialization of all the phone models,
then used HERest to do the embedded training. I started from 1 mixture
models, and gradually splitted the models into 2, 4, 6, 8, 12, 16
mixtures. In this whole processing, the initial phoneme transcription
was used. After 16 mixture models were created, I began forced alignment
using HVite. I only needed to find the best pronunciations, not the
time stamps. But it looks strange to me that the output phoneme
transcription is exactly the same as the initial transcription. This was
found by HResults.
I tried different mixture models. I tried 2 mixture models,
and 4 mixture models, but still the same. I do not even know is this a
good thing or bad thing and any possible reasons. I noticed that in my
dictionary, though there were many words with multiple pronunciations,
those pronunciations did not differ much. Maybe that's the reason??? I don't know if it
means the initial transcriptions are already good , and the models are
already well trained, or it means the models are bad...
Can anybody give me any possible reasons?
Mike
--- (Edited on 5/20/2014 12:42 am [GMT-0500] by weedwind) ---