VoxForge
Hi Ralf,
thanks for the feedback.
>language model or acoustic model - I don't know the difference
From the VoxForge Tutorial:
All Speech Recognition Engines ("SRE"s) are made up of the following components:
- Language Model or Grammar - Language Models contain a very large list of words and their probability of occurrence in a given sequence. They are used in dictation applications. Grammars are a much smaller file containing sets of predefined combinations of words. Grammars are used in IVR or desktop Command and Control applications. Each word in a Language Model or Grammar has an associated list of phonemes (which correspond to the distinct sounds that make up a word).
- Acoustic Model - Contains a statistical representation of the distinct sounds that make up each word in the Language Model or Grammar. Each distinct sound corresponds to a phoneme.
- Decoder - Software program (like Sphink, Julius, HTK's HVite) that takes the sounds spoken by a user and searches the Acoustical Model for the equivalent sounds. When a match is made, the Decoder determines the phoneme corresponding to the sound. It keeps track of the matching phonemes until it reaches a pause in the users speech. It then searches the Language Model or Grammar file for the equivalent series of phonemes. If a match is made it returns the text of the corresponding word or phrase to the calling program.
>So why not integrate all of them into the VoxForge speech submission application?
Unfortunately, we are getting to the point where I need to create separate builds of the SpeechSubmission app for each language, otherwise the size of the downloadable application will get to big. I will add this an RFE in Trac.
Ken
Hi Ralf,
>How is it possible to use this pronunciation lexicon to create a first edition of the German acoustic model?
The VoxForge Tutorial shows how to do it for English. You should be able to create a workable triphone acoustic model by doing step 1-9, using German prompts and pronunciation dictionary.
To be able to complete Step 10 and create tied-state acoustic models you need a German tree.hed script. For more information on how to create a tree.hed file for a new language, see the following links:
>Is there any one who can do this job?
Unfortunately, I can't do this right now. My current focus is segmenting all the LibriVox audiobook submissions - some date back to June of last year :( , and squeezing in another release of the speech submission app (for Italian and Russian). So it will be a while before I can look at this.
> It could be a workaround to eliminate those sentences which contain
>special characters of the German language.
Thanks for the suggestion (I like easy workarounds ...) but there must be an easy way to address this in Java - some unicode settings that I have missed ...
Ken
Well, I could build a model this weekend, until that you probably need to install and try pocketsphinx either on windows or, better on Linux. About language model, filtering is a trivial step already done by language modelling toolkits, we'll return to this later when we'll have acoustic model but you only need to use one of them.
Hi nsh,
Unfortunately I have not moved any German audio to subversion.
However, here is quick and dirty way to get the audio:
1. $wget -r -l2 http://www.voxforge.org/home/downloads/speech/german-speech-files -A "ralfherzog*"
this will create a directory called www.voxforge.org
2. search the directory for *.zip files using Gnome's search tool, and drag the results to the directory you want.
Ken