VoxForge
Sam wrote in an email:
Can we make one side of the project that only accepts user submitted wav's and one that accepts high quality ogg's and mp3's from librivox? Last time I checked, there were 3000 works that were recorded, and many of them are over an hour long. Even though lossy formats are not ideal, it may be necessary to speed up corpus completion.
--- (Edited on 11/16/2009 2:32 pm [GMT-0500] by kmaclean) ---
my reply:
Hi Sam
Can we make one side of the project that only accepts user submitted wav's and one that accepts high quality ogg's and mp3's from librivox? Last time I checked, there were 3000 works that were recorded, and many of them are over an hour long. Even though lossy formats are not ideal, it may be necessary to speed up corpus completion.
We have been accepting LibriVox Audio book chapters in wav format ( http://www.voxforge.org/home/submit/audiobooks ). We have also tried creating acoustic models using high quality mp3 (http://www.voxforge.org/home/dev/mp3-compare)
with OK results - the resulting acoustic model was not as good as using
native wav audio, but since there is so much good quality mp3/ogg audio
data available on the LibriVox site, this should not be a big problem...
There should be no need to divide the project since we could easily
just download the mp3/ogg audio we want from LibriVox (rather then
getting them to upload it to VoxForge after having already done so on
the LibriVox site), and then perform the required changes so that the
audio can be used for acoustic model training, and then upload the
resulting wav files to VoxForge.
Just describe your audio in the README file as follows (so we know
that you started with compressed audio, and that the reading is a
LibriVox reading):
File Info:
File type:mp3@128kps->wav (or ogg->wav)
Sampling rate: [44.1kHz];
Sample rate format: [16bit];
Number of channels: [1];
Audio Processing: unknown.
AudioBook Info:
Source: [LibriVox];
Book: [Etiquette in Society, in Business, in Politics and at Home];
Chapter: [Chapter 35 - The Kindergarten of Etiquette];
Author: [Emily Post];
Reader: [chocoholic].
Note
that you need to segment the audio into 15-25 word files and create a
prompt file with the name of tha audio file in the first colum, and the
sentence on the reminder of the line. We also need pronunciations for
words not currently in the VoxForge dictionary.
I've written a Perl script to automate some of this (Audiobook.pm - see inline documentation).
If you want to try segmenting some LibriVox mp3/ogg speech (converting
it to wav - since HTK or Sphinx can't deal with compressed audio
natively) and creating the pronunciations for the out of vocabulary
words, then that would be greatly appreciated.
I would like to add this thread to the VoxForge forum - please let me know if this is OK.
Ken
--- (Edited on 11/16/2009 2:34 pm [GMT-0500] by kmaclean) ---