General Discussion

Flat
Split development?
User: kmaclean
Date: 11/16/2009 1:32 pm
Views: 4793
Rating: 8

Sam wrote in an email:

Can we make one side of the project that only accepts user submitted wav's and one that accepts high quality ogg's and mp3's from librivox? Last time I checked, there were 3000 works that were recorded, and many of them are over an hour long. Even though lossy formats are not ideal, it may be necessary to speed up corpus completion.

 

--- (Edited on 11/16/2009 2:32 pm [GMT-0500] by kmaclean) ---

Re: Split development?
User: kmaclean
Date: 11/16/2009 1:34 pm
Views: 105
Rating: 7

my reply:

Hi Sam

Can we make one side of the project that only accepts user submitted wav's and one that accepts high quality ogg's and mp3's from librivox? Last time I checked, there were 3000 works that were recorded, and many of them are over an hour long. Even though lossy formats are not ideal, it may be necessary to speed up corpus completion.

This can be easily accommodated with the current infrastructure (see this thread for more detailed info on this subject: http://www.voxforge.org/home/docs/faq/faq/what-kind-of-audio-formats-is-voxforge-looking-for).

We have been accepting LibriVox Audio book chapters in wav format ( http://www.voxforge.org/home/submit/audiobooks ).  We have also tried creating acoustic models using high quality mp3 (http://www.voxforge.org/home/dev/mp3-compare) with OK results - the resulting acoustic model was not as good as using native wav audio, but since there is so much good quality mp3/ogg audio data available on the LibriVox site, this should not be a big problem...

There should be no need to divide the project since we could easily just download the mp3/ogg audio we want from LibriVox (rather then getting them to upload it to VoxForge after having already done so on the LibriVox site), and then perform the required changes so that the audio can be used for acoustic model training, and then upload the resulting wav files to VoxForge.

Just describe your audio in the README file as follows (so we know that you started with compressed audio, and that the reading is a LibriVox reading):

    File Info:
    File type:mp3@128kps->wav (or ogg->wav)
    Sampling rate: [44.1kHz];
    Sample rate format: [16bit];
    Number of channels: [1];
    Audio Processing: unknown.

    AudioBook Info:
    Source: [LibriVox];
    Book: [Etiquette in Society, in Business, in Politics and at Home];
    Chapter: [Chapter 35 - The Kindergarten of Etiquette];
    Author: [Emily Post];
    Reader: [chocoholic].

Note that you need to segment the audio into 15-25 word files and create a prompt file with the name of tha audio file in the first colum, and the sentence on the reminder of the line.  We also need pronunciations for words not currently in the VoxForge dictionary. 

I've written a Perl script to automate some of this (Audiobook.pm - see inline documentation). 

If you want to try segmenting some LibriVox mp3/ogg speech (converting it to wav - since HTK or Sphinx can't deal with compressed audio natively) and creating the pronunciations for the out of vocabulary words, then that would be greatly appreciated.

I would like to add this thread to the VoxForge forum - please let me know if this is OK.

Ken

--- (Edited on 11/16/2009 2:34 pm [GMT-0500] by kmaclean) ---

Re: Split development?
User: kmaclean
Date: 11/16/2009 1:34 pm
Views: 1936
Rating: 7

Sam's reply:

Add it to the forum! It greatly clarifies something that I've been wondering about for a while.

--- (Edited on 11/16/2009 2:34 pm [GMT-0500] by kmaclean) ---

PreviousNext