VoxForge
Someone on Slashdot pointed out that LibriVox data might be useful for you. The full text of public domain books is often available online, which could serve as a transcript. Segmentation of the audio into smaller chunks might be needed for training. (I'm not sure if it would be; I'm used to training with speech files separated into sentences but I don't know if that's necessary.) But if so, maybe an automated forced alignment against the text could be used to do that.
--- (Edited on 10/12/2006 1:28 pm [GMT-0500] by Visitor) ---
Thanks for the reference!
Librivox is definitely another source of audio for us. We've been looking at other ways to get speech audio into the project and have created an ever expanding list of links in the VoxForge Dev Wiki. I've added Librivox.
The work in this case, as you pointed out, would be in the segmenting of the audio data. I've tried a large audio file with time stamped prompts in HTK and the processing time seemed much too long. HTK was much more effecient with smaller speech files, with no time stamps in the prompts. I have not tried either approach with Sphinx.
There could be two approaches to doing this, the first would be to create how-tos for people, who don't want to submit their voice, to segment audio books. The second, which you have already mentioned, would be an automated segmentation script looking for pauses in the speech file, and creating a line of prompts corresponding to the contents of the segmented speech audio file.
Another concern would be the quality of the recordings - we need uncompressed audio, which is why WAV was chosen as format. It may be that we need to talk to Librivox, and others, about getting their users to submit uncompressed audio to VoxForge and compressed audio to their own sites.
lots to think about, thanks again,
Ken
Hi Ken,
I don't know how far you are in automating the segmentation of librivox recordings, but I think that perhaps it's useful anyway to get back to the people from librivox anyway. As I see it, it's a waste if someone records a chapter of a book, uploads an mp3 it to librivox and then disposes of the original recordings without even being aware of the possibility of donating this data to voxforge as well.
The people at Librivox seemed very friendly and cooperative, so that shouldn't be the problem. However thread in the forum that deals with this matter probably doesn't attract that many readers anymore.
Since we need hundreds of hours (eventually) for really good acoustic models, I think we could benefit greatly from a more structural approach.
That would in my humble opinion only require a tiny message on for example this page:
http://librivox.org/wiki/moin.cgi/HowToSendYourRecording
Something like:
"If you need to delete your recordings after uploading them to Librivox, consider donating your recording in an uncompressed form to the VoxForge speech recognition project too. See there site for more info."
With a link to a page on voxforge.org specially set up for Librivox users.
What do you think?
Cheers,
Robin
--- (Edited on 5/2/2007 6:36 am [GMT-0500] by Robin) ---
Hi Robin,
Excellent idea!
The Librivox community is amazingly supportive. I agree that the newsgroup post no longer has the required exposure.
I'll email Hugh McGuire and see if they are OK with such an addition to their "HowToSendYourRecording" page,
thanks,
Ken
--- (Edited on 5/2/2007 9:34 am [GMT-0400] by kmaclean) ---
Hugh said I should put it to the librivox Community ... here is the link:
Adding VoxForge link to "HowToSendYourRecording"
Ken
--- (Edited on 5/2/2007 2:58 pm [GMT-0400] by kmaclean) ---
We now have a link on the LibriVox site! ... here it is:
http://librivox.org/wiki/moin.cgi/HowToSendYourRecording
Thanks to Robin for this idea.
If anyone else has any suggestions or ideas for promoting the VoxForge site, please let me know.
Thanks,
Ken
--- (Edited on 5/3/2007 11:23 am [GMT-0400] by kmaclean) ---
Hi Robin
>I don't know how far you are in automating the segmentation of librivox recordings,
Actually, we are doing pretty good on this front - most of it is automated (see this web page for details: Automated Audio Segmentation Using Forced Alignment).
The last piece to automate is to create pronunciations for words that are not in the VoxForge pronunciation dictionary (i.e. out of vocabulary words). Tony Robinson helped out in this regard (see this post), and I was able to find this script to do it: t2p: Text-to-Phoneme Converter Builder.
Once I test this out, and create a script for the whole process, we will be in pretty good shape for mass conversions of LibriVox audio books (uncompressed or compressed).
Ken
--- (Edited on 5/3/2007 3:58 pm [GMT-0400] by kmaclean) ---