Comments

Nested
Untitled
User: Visitor
Date: 10/10/2006 6:14 pm
Views: 2133
Rating: -39    Rate [
]
Probably a stupid question, but I'll go anyway: instead of collection a
lot of samples couldn't we just generate them by using something like
festival or any other text-to-speech software?
Reply
Re: Untitled
User: kmaclean
Date: 10/10/2006 7:08 pm
Views: 520
Rating: -8    Rate [
]

Unfortunately, Festival voice output is not of the quality we need in order to create good  Acoustic Models.  We would essentially be training the Acoustic Models to recognize Festival output, rather than human speech.  We need human speech to ensure that the Acoustic Models can recognize other humans.

Having said that I have heard some commercial quality TTS (Text-to-Speech) engines that comes pretty close to human voice, but I get the sense that with the current state of  technology we still need human speech to train the Acoustic Models.

Hope that clarifies things a bit,

Ken 

Reply
PreviousAdd