VoxForge
Automatic Building of Synthetic Voices from Large Multi-Paragraph Speech Databases
Paper that discusses speech segmentation issues from the perspective of creating new voices for text-to-speech, very applicable to our situation where we need segmented speech for the creation of acoustic models.
From the abstract:
Large multi paragraph speech databases encapsulate
prosodic and contextual information beyond the sentence level
which could be exploited to build natural sounding voices. This
paper discusses our efforts on automatic building of synthetic
voices from large multi-paragraph speech databases. We show
that the primary issue of segmentation of large speech file could
be addressed with modifications to forced-alignment technique
and that the proposed technique is independent of the duration
of the audio file. We also discuss how this framework could be
extended to build a large number of voices from public domain
large multi-paragraph recordings.
--- (Edited on 2/26/2010 1:09 pm [GMT-0500] by kmaclean) ---