VoxForge
Hi Jonas,
>We have quite a lot of tools and corpora, but unfortunately not much time to handle it at the moment. Is there a time plan for when there will be possibilities for people to donate their recordings etc in other languages?
If you would like us to host your corpora on VoxForge, yes this can be done. If it is a large corpus, I can set up an FTP link for your to upload your speech, transcriptions, pronunciation dictionary, and tools. If it is not that large, I can set up a forum for Swedish, and you can upload files as time permits, and I can put them into the SVN repository.
>If I (for example) arrange a phone number etc and take care of the recordings and so on, would it then be possible to admit information and resources from these web pages for such a project?
Yes. Note that we currently have an automated speech collection script that works with Asterisk and submits the audio automatically to a VoxForge Forum - see the VoxForge IVR project (many thanks to trevarthan for developing this app). You could modify the prompts to suite your needs.
thanks,
Ken
Well there is phonetic dictionary I suppose and a language model can be easily constructed. So you can start right now with recording transcribed speech. Any free text from guttenberg.org is acceptable. It should be split on sentences.
Once there will be audio data it's possible to train model.
Hi ralfherzog,
Done ... I've added a new forum for submitting German Speech Files.
You should be able to find German translations of the GPL license on the fsf.org site - please include both German and English versions of it in your submission.
You might also want to talk to the folks at the Simon project (dialog manager that uses Julius). They are working on creating German Acoustic Models using HTK.
thanks,
Ken
Hi Ralf,
I've created a dev site for German at:
http://www.dev.voxforge.org/projects/de (trac site)
http://www.dev.voxforge.org/svn/de (subversion site)
This is basically a Subversion site (used for software version control) with a Trac front-end. Trac is nice because it provides a simple to use wiki environment. I will send you password so you can log on and make changes (you don't actually need to log on to make changes, but then the wiki won't keep track of who made which changes; there are also some admin functions that require a log on).
With respect to creating a German version of the VoxForge site with something called: de.voxforge.org I need to think about how to structure this so that you could make the updates. WebGUI (the content management system front-end) is a little difficult to learn at first, but very powerful.
Once you have a few hours of audio (from different users), we can look at creating something like "de.voxforge.org".
all the best,
Ken
Great, nice to see such progress :)
A few thoughts about German. CMU people were going to share the framework and a models trained from Vermobile (large German database):
http://www.speech.cs.cmu.edu/sphinx/twiki/bin/view/Sphinx4/GermanAcousticModel
http://sourceforge.net/forum/message.php?msg_id=4279928
I suppose it will take years to make a decision for them :(
German dictionary is available here:
http://www.ims.uni-stuttgart.de/phonetik/synthesis/
under a restrictive license. But probably we can use it for bootstrapping. Under GPL we have only rules from espeak I suppose. The same situation as with Dutch.
Hi Timo,
>5. Some more administrative stuff: I am unable to edit the dev wiki. Do I need an extra account for that?
Yes.
Up until about 1-2 months ago, I had mod_security working perfectly to catch spammers on the Trac dev wiki, and allow users to post without signing in. But when I upgraded the distro on the server, I could not get it to work properly... :( I need to spend some more time on this.
I will send you an email with a password to allow you to update the wiki.
>6. What about a sub forum for the German language? This would improve
>both our communication as well as the visibility for this language's sub project.
Certainly, what did you have in mind?
I could add a separate section on the Forums Page called "International" or something like that. And have specific forums for each language we support.
I'd also like to use the proper labels for each language (I've been too English-centric up until now ...) - should the German forum be called the "Deutsch Forum"?
Ken
Hi Ken,
> I will send you an email with a password to allow you to update the wiki.
may this have slipped from your todo-list?
> > 6. What about a sub forum for the German language?
> Certainly, what did you have in mind?
Well, I think I have figured out that in order to be on par with Dutch/Italian/etc., we'd just have to post one level higher, instead of commenting in the other languages thread. But then again: It would be cooler to have separate forums for each language within the other language forum, so that we can have different threads for each language (which would automatically improve our visibility, as only new threads are shown in the recent posts section). Is it possible with the forum engine to have a hierarchical forum structure? Otherwise we could move "established" languages to the top level (probably with a common prefix so that they all stand next to each other).
I personally don't care, if it's "international" or "other" languages. I kind of thought though, that English is quite an international language by itself? As for the forum labels: I really prefer "German" over "Deutsch". We would only discourage people from other languages to read the contents of the other sub forums. I can probably learn a lot by reading through the italian forums. But only if I can read it.
Greetings from Berlin!
Timo