VoxForge
Hi colore,
Sphinx, ISIP, HTK, and Julius can all do languages other than English. It's just that you will need to create Acoustic Models and Language Models for the target language.
The Julius Speech Recognition Engine is the only one of the 4 listed above that actually states that it can perform dictation. However, they only supply Japanese Acoustic Models and Language Models.
You might want to start with the VoxForge Create Acoustic Model Tutorial to get a better understanding of what is involved in creating an English Acoustic Model for Julius (using the HTK HMM Toolkit). And then try creating a Greek Acoustic Model using your voice, and test it with a Greek grammar. Once you have accomplished this, go to the HTK Book for information on creating Language Models for dictation.
It's not easy, but it can be done,Ken
--- (Edited on 4/23/2007 1:43 pm [GMT-0400] by kmaclean) ---
I've built a Greek LVCSR system in the past. One of the advantages that you have is that just a few rules get you from the Greek letters to pronunciations (unlike English).
Good luck,
Tony
--
Dr Tony Robinson, CEO Cantab Research Ltd
Phone: +44 845 009 7530, Fax: +44 845 009 7532
--- (Edited on 23-April-2007 7:04 pm [GMT+0100] by Tony Robinson) ---
The system I built (for my previous company), is "enterprise-class" commercial software, i.e. priced at corporate not personal budgets.
You might get lucky and persuade a Greek university to part with the acoustic and language models needed, but somehow I doubt it as there can be quite a lot of support issues if you are not used to speech recognition (and even if you are).
Which gets you back to Ken's post, if I were you I'd follow his advice, build a system in English based on the code and data on this site, then when you are confident you know what you are doing, record your own voice and build a Greek system.
Tony
--
Dr Tony Robinson, CEO Cantab Research Ltd
Phone: +44 845 009 7530, Fax: +44 845 009 7532
--- (Edited on 23-April-2007 7:30 pm [GMT+0100] by Tony Robinson) ---
I forgot to mention HDecode (included with HTK version 3.4), which is a large vocabulary speech recognition decoder, which should be able to perform dictation. However, it has license restrictions, which limits the use of the software and generated acoustic models to research purposes only.
Another option is the Spice Project (still under construction),
which is working on a web site that will provide the ability to create an
Acoustic Model, Language Model and Dictionary in the language of
your choice, for use with the Janus Speech Recognition Engine.
Unfortunately, Janus is not open source, but you might want to contact
them to get a Janus run-time. Spice was reviewed in this post.
Ken
--- (Edited on 4/24/2007 9:53 am [GMT-0400] by kmaclean) ---
--- (Edited on 4/25/2007 4:46 pm [GMT-0500] by Visitor) ---
Hi colore,
This was discussed in this post.
I can't really tell you which is better. I have not done any performance comparisons.
I'm biased towards HTK/Julius, because that is the first package I started working on. I picked HTK because it seemed to have the best documentation (at the time ...).
Julius
is supposed to work in dictation applications (in Japanese at least
...) - once VoxForge has robust enough Acoustic Models, we will be able
to test this claim for English. It can also be used in Command
and Control applications and telephony applications using its Julian
module.
On the other hand Sphinx has a larger community. But it tends to be used more in command and control or telephony applications. The xVoice project tried to use Sphinx-2 for dictation (trying to replace IBM's closed source Via-Voice), but gave up on it.
I don't have much experience with ISIP yet.
So it really depends on what you are trying to do, and how much help you will need to do it ...
Ken
--- (Edited on 4/30/2007 4:37 pm [GMT-0400] by kmaclean) ---
Hi colore,
You might find this article of interest:
A COMPARISON OF PUBLIC DOMAIN SOFTWARE TOOLS FOR SPEECH RECOGNITION, by Samudravijaya K and Maria Barot.
They compare HTK and Sphinx for a Hindi speech recognition system. They conclude that "although recognition accuracies of the two systems are comparable, [they] observe that the acoustic modeling of Sphinx is superior"
In my experience, Julius (even though it uses Acoustic Models created using the HTK toolkit) performs much better than HTK in grammar related tasks. I don't know how well it compares against the Sphinx group of speech recognizers.
Ken
--- (Edited on 5/1/2007 3:35 pm [GMT-0400] by kmaclean) ---