Hi!
I was looking at the French submited speech data, and I saw that only a part of it was in the Voxforge repository for download. The rest seems to be in the upload directory, which is access restricted, so it is not very easy to recover all the corpus except manually from the download page.
Is there a specific reason for that, or is there a way to get the corpus easily? I saw this post where Ken says:
Unfortunately I have not moved any German audio to subversion.
However, here is quick and dirty way to get the audio:
1. $wget -r -l2 http://www.voxforge.org/home/downloads/speech/german-speech-files -A "ralfherzog*"
this will create a directory called www.voxforge.org
2. search the directory for *.zip files using Gnome's search tool, and drag the results to the directory you want.
I'm not a wget expert but I don't think it's going to get files which are not in the specified directory. Any help?
Thanks a lot!
Marion
Hello Marion
> Is there a specific reason for that
Ken is a bit busy nowdays :) let's not distract him
I think
wget -r -l2 http://www.voxforge.org/home/downloads/speech/french-speech-files
will just work for you.
It's what I did but as I said before, most of the corpus is in the updload directory, and to access it you need the complete address to each zip file, like http://voxforge.org/uploads/q0/0Q/q00QgKBqYb4KK6_qzhITig/phil_be-20090310-mif.zip, so you can't just do
$wget -r -l3 http://www.voxforge.org/uploads
I found a solution using WinHTTrack but wget should have worked too, it's just that you have to download a lot of stuff and then erase all but the zip files you want.
I just wanted to point out that not all French data is in the repository, but I perfectly understand that you don't have time to process all!
Thanks anyway for the answer and this project, this is great!
Marion
Hi Marion,
>I just wanted to point out that not all French data is in the repository,
All the French submissions are now in the repository:
http://www.repository.voxforge1.org/downloads/fr/Trunk/Audio/
Ken
Hi Ken,
ooooops, I wanted to download the French corpus today but I've made a mistake and I've downloaded much more than expected.
I hope not to have been the cause of trouble for your server.
Sorry about that.
Samuel-
Hi Samuel,
No worries, that is why the VoxForge respository is on a separate server from the website front-end.
Ken