VoxForge
Email from Sven:
Hi everyone,
I am a native German translator with 12 years experience for
translation of technical texts from English to German. I am also a linux system administrator (though administration
is not paying me yet ;)
As you can imagine, a reliable speech to text engine would make my life hundreds of times easier on many days.
So, on searching for free speech recognition I came across Julius and
voxforge available in ubuntu repositories. I also studied the
voxforge website a bit.
As I understood your info on the website, your *English* acoustic
model is currently relying on getting more user audio input to reach usability for an actual dictation application.
However, do you already posess a grammar model (the file sample.voca
and sample.grammar in your release) that is capable of more than a few
words and the digits? Is it able to download such a voca file
somewhere, so I could at least test the reliability of your acoustic
model when it comes to dictation?
Is there any *working* dictation application available for Linux,
even if it is for English speech so I could get a picture of the
current reliability of the efforts?
And then: What effort would be needed to produce a reliable GERMAN acoustic model together with complete grammar files?
Where are you located? Do you have a German contact?
I would love to get into some deeper discussion about this with one of your engineers/maintainers.
Kind Regards,
Sven
my reply:
Hi Sven,
As I understood your info on the website, your *English* acoustic
model is currently relying on getting more user audio input to reach
usability for an actual dictation application.
However, do you already posess a grammar model (the file sample.voca
and sample.grammar in your release) that is capable of more than a
few words and the digits? Is it able to download such a voca file
somewhere, so I could at least test the reliability of your acoustic
model when it comes to dictation?
Is there any *working* dictation application available for Linux,
even if it is for English speech so I could get a picture of the
current reliability of the efforts?
And then: What effort would be needed to produce a reliable GERMAN
acoustic model together with complete grammar files?
Where are you located? Do you have a German contact?
VoxForge is not a commercial enterprise, though Nickolay (nsh) has a company that does speech recognition consulting work.
You
might want to try the VoxForge acoustic model creation tutorial to get
an idea of what is involved in creating an acoustic model.
I'd like to post this thread on the VoxForge forum to get feedback from others, please let me know if that is OK
thanks,
Ken
Edit: Fixed links
His reply:
Thanks for the anwsers, Ken! Of course, you can post this on the forum.
As said, I am a long experienced technical translator (English > German). And, amongst my colleagues every
now and then somebody brings up the question of speech recognition for dictation (Usually after having typed,
again, some word like 'Kontrollkästchen' (check box) a hundred times in the last few minutes ;). But, no one
ever really bought an dictation app as far as I know. Translators naturally are fast in typing and efficient
in using key shortcuts for controlling the typing application. So a recognition engine would have to be fast,
too, and allow for a high degree of customization when it comes to controlling commands. And then, our texts
are usually highly specialized, which might be a disadvantage when using "general" dictation apps, but might
actually be an advantage when one would try to tailor the recognition engine (speech corpus?) for technical
texts. A lot of technical terms, but low in overall count of unique words and mostly simple in grammar
actually used in the texts.
I really do not know what market there would be to get some return on the efforts for a German dictation app.
I just know that, when I am facing tight deadlines for a project, my hands often get stiff, and my wrists
start to hurt after typing for hours and hours. At that point, I sometimes would welcome, and perhaps even pay
for, a dictation app if it would work fast and reliably.
thanks,
Sven
Hi Ken!
"You would also need a million+ word Language Model."
Yes, that is true. But currently, we have performance problems with large lexicons. A solution could be to learn from how OpenOffice.org spelling dictionaries are beeing compressed. Their dictionaries are being split into two files (.dic and .aff). In the long term, such an approach would help us. I think that pronunciation dictionaries for German, Latin, Dutch, Italian, Spanish need a special compression method. It would be good if someone who knows how OpenOffice.org spelling dictionaries are being compressed would provide prefix and suffix morphological rules. So at the moment, we are far away from a solution for the German language.
Hello Sven,
"Is there any *working* dictation application available for Linux"
No, there isn't.
"What effort would be needed to produce a reliable GERMAN acoustic model together with complete grammar files?"
Well, I could need help with the improvement of Ralf’s German dictionary (GPLv3; contains more than 300.000 German words).
I want to use sam for the development of a German acoustic model. In my opinion, this approach probably would be the easiest way.
Greetings,
Ralf
Hi Ralf,
>Yes, that is true. But currently, we have performance problems with
>large lexicons.
Language models and pronunciation dictionaries (i.e pronunciation lexicons) are separate things.
A language model is a very large list of words/phrases with their probability of occurrence and is used in dictation applications. Simon is used for command and control and I believe uses grammar files.
Pronunciation dictionaries are used in the creation of acoustic models, and for pronunciation information in grammar files.
Ken
P.S. thanks for letting me know about the link problems... they worked fine in the email, but something happened when I copied them to the WebGUI forum... must be a feature :)