VoxForge
You can get the lexicon on the Acoustic Model builds page:
VoxForge_Dictionnary_build726.tgz
These build scripts aren't really for "general consumption" yet - regardless, I created a new scripts folder on the VoxForge Repository, and created a snapshot of the scripts folder from the dev site:
VoxForge_Scripts_Snapshot.tgz
Ken
P.S. added ticket #115 to make sure the request for SVN access doesn't get lost.
--- (Edited on 11/ 8/2006 1:24 pm [GMT-0500] by kmaclean) ---
--- (Edited on 12/26/2006 2:59 pm [GMT-0500] by kmaclean) ---
--- (Edited on 12/26/2006 2:55 pm [GMT-0500] by kmaclean) ---
Hey Ken,
Have you thought of any possible solutions for the bandwidth issue? I feel handicapped because I don't have easy access to the "source code" (audio in this case + scripts) that you use to generate the AM.
How large is your working copy currently?
Hmmm.... my only useful suggestions are sourceforge and/or bittorrent.
--- (Edited on 2007-04-05 23:45:56 [GMT-0400] by trevarthan) ---
Hi Jesse,
I've got a SourceForge account already set up, but it is currently not up to date. It only includes the scripts and the mfc files (which are much smaller than wav files) - but this is all you need to 'compile' the VoxForge Acoustic Models. SourceForge is not clear on the maximum storage they allow, so I was thinking that 100Gig in wav audio files (our first release target) would be too big. I plan to add a nightly update of the scripts and mfc to SourceForge.
Regardless, all the scripts and Audio are now updated nightly (and have been for a while now...) on the VoxForge Repository server (a 1&1 account) - it mimics the directory structure in SVN, but uses tarballs for the audio (the tarballs do not include svn data). I have loads of bandwidth (2TB/month), so that is the preferred download method.
The Downloads directory (http://www.repository.voxforge1.org/downloads/) looks like this:
Nightly_Builds/ 05-Apr-2007 04:19 -
Tags/ 29-Dec-2006 16:35 -
Trunk/ 26-Dec-2006 13:02 -
builds/ 16-Oct-2006 11:19 -
large_audio_files/ 13-Mar-2007 23:34 -
mp3_test/ 09-Mar-2007 00:32 -
software/ 29-Mar-2007 13:53 -
speech_corpus/ 17-Oct-2006 11:06 -
For Audio, you would be interested in the Trunk/Audio directory (updated nightly from the VoxForge Website SVN server):
MFCC/ 13-Dec-2006 22:49 -
Main/ 13-Dec-2006 22:48 -
Original/ 13-Dec-2006 22:48 -
For the Scripts, you would be interested in the Truck/Scripts directory (updated nightly from the VoxForge Website SVN server):
AudioBook_scripts.tgz 05-Apr-2007 04:19 110M
Audio_scripts.tgz 05-Apr-2007 04:18 34k
HTK.tgz 05-Apr-2007 04:18 20.2M
Metrics_scripts.tgz 01-Feb-2007 05:49 11k
Mirroring_scripts.tgz 05-Apr-2007 04:18 20k
Testing_scripts.tgz 09-Mar-2007 03:57 485k
I could also set you up with SVN access to the VoxForge SVN server if you would prefer (it would take some work to get WebDAV going), but even so, I would prefer than you only checkout scripts and maybe the mfcs ... downloading a big chunk of wav files would slow the server down noticeably for other users (I have not had a chance to look at bandwidth limitation by user or port yet ...), and I only have 100Gig of monthly bandwidth on this server.
Ken
--- (Edited on 4/ 6/2007 10:14 am [GMT-0400] by kmaclean) ---
OK, so since the high bandwidth server isn't under our control we probably can't run SVN or rsync on it, right?
What if we broke the project into two separate svn projects. Source code, scripts, etc go into one project, and audio into the other.
Then we can open up the source code project to developer access without worrying about bandwidth and we can make the audio project downloadable from the high bandwidth server via something like `wget --mirror`? This way we get the following advantages:
1.) Can still get incremental updates of the audio in it's native form without downloading the whole thing all over again (wget runs on win32 also, so it's multiplatform, if a bit out of the mainstream)
2.) We won't bog down voxforge's bandwidth with audio downloads
3.) We can still develop our scripts in a sane SVN environment.
Sounds like a win-win-win to me. We could either provide the audio in a non-gzipped format on the high bandwidth server, or we could continue providing it the way it is now and write a script to automatically unpack it into a usable dir structure.
What do you think?
--- (Edited on 2007-04-06 11:30:05 [GMT-0400] by trevarthan) ---
Public VoxForge Subversion Repository URL is now located here:
http://www.dev.voxforge.org/svn/Main
The Corresponding Trac site is now located here:
http://www.dev.voxforge.org/projects/Main
$ svn checkout http://www.dev.voxforge.org/svn/Main/Trunk/Scripts/
Notes:
Ken
--- (Edited on 4/ 8/2007 10:27 pm [GMT-0400] by kmaclean) ---
--- (Edited on 4/12/2007 11:31 am [GMT-0400] by kmaclean) ---
--- (Edited on 5/3/2007 2:18 pm [GMT-0400] by kmaclean) ---
--- (Edited on 4/10/2007 5:06 am [GMT-0500] by gongdusheng) ---