Click here to register.

Languages

VoxForge was set up to collect transcribed speech for use with Free and  Open Source Speech Recognition Engines (on Linux, Windows and Mac). 

We will make available all submitted audio files under the GPL license, and then 'compile' them into acoustic models for use with Open Source speech recognition engines such as CMU Sphinx, ISIP, Julius and HTK (note: HTK has distribution restrictions).

Why Do We Need Free GPL Speech Audio?

Most acoustic models used by 'Open Source' speech recognition (or Speech-to-Text) engines are closed source.  They do not give you access to the speech audio and transcriptions (i.e. the speech corpus) used to create the acoustic model. 

The reason for this is that Free and Open Source ('FOSS') projects are required to purchase large speech corpora with restrictive licensing.  Although there are a few instances of small FOSS speech corpora that could be used to create acoustic models, the vast majority of corpora (especially large corpora best suited to building good acoustic models) must be purchased under restrictive licenses.

How Can You Help?

Record yourself reading some text and upload your recordings to VoxForge.

Other Options.

News

By kmaclean - 5/26/2015 I would like to thank mrt_doulaty for the Farsi (Persian) translations of the VoxForge web site and speech submission applet.

By kmaclean - 4/28/2015 VoxForge is now mirroring the LT and the Teleccoperation group Open Speech Data Corpus for German with 35 hours of speech from about 180 speakers.

By kmaclean - 6/17/2014 We would like to thank Thawte for renewing the code signing certificate for the VoxForge speech submission applet for another 2 years.