VoxForge
Unfortunately, Festival voice output is not of the quality we need in order to create good Acoustic Models. We would essentially be training the Acoustic Models to recognize Festival output, rather than human speech. We need human speech to ensure that the Acoustic Models can recognize other humans.
Having said
that I have heard some commercial quality TTS (Text-to-Speech) engines
that comes pretty close to human voice, but I get the sense that with
the current state of technology we still need human speech to
train the Acoustic Models.
Hope that clarifies things a bit,
Ken