VoxForge
Hi,
Does anyone know if it possible to make the speech recognition in two languages simultanealy. Let's say that if my two languages are english and spanish, the machine would be capable of recognize the word hello or hola (hello in spanish)...
For doing this should I adapt the english one with my voice in spanish (and add the spanish phonemes), or does exist any other way?
Thanks in advance.
--- (Edited on 11/8/2008 5:48 pm [GMT-0600] by ubanov) ---
Hi ubanov,
>if it possible to make the speech recognition in two languages simultanealy
You best bet would be to create separate acoustic models, one for each target language, and run separate instances of the speech recognition engine (one for each model) against the same audio. I have read about cases where separate acoustic models are used for male and female speakers, and the most likely recognition result is taken (at the application level).
I suppose you could create a combined acoustic model for two different languages, but I am not sure how well this would scale. My sense is that it is best to keep your acoustic model as specific as (reasonably) possible to the domain your recognition engine will be working in.
However, for a small grammar, it might work... give it a try and let us know how you make out :)
Ken
--- (Edited on 11/12/2008 3:33 pm [GMT-0500] by kmaclean) ---
I have tried to run two separate julian processes, one with spanish, and the other with english, and when I run the second instance the program tells me that the dsp device is busy:
include config: julian.jconf
###### check configurations
###### initialize input device
adin_mic_standby: device=/dev/dsp
adin_mic_standby: open device: Device or resource busy
Error: failed to ready input device
Terminated
May be you tell to run the 2 recognizers agains a wav file?
--- (Edited on 11/20/2008 6:17 pm [GMT-0600] by ubanov) ---
Hi ubanov,
>May be you tell to run the 2 recognizers agains a wav file?
I was thinking that you might run two Julius server's (one using a Spanish acoustic model, and the other in English), and modify the Julius adinet client to send audio to both servers simultaneously.
Another approach *might* be to run Julius in multithreaded mode, with a different acoustic model for each instance (not sure how easily this could be done...). Enrique's post on the Julius forum says that he had 15 recognizers running in parallel. But you would still need to have a unique port for each instance, and modify adinet to send the audio to both ports.
This might be a good question for Lee Akinobu (the Julius maintainer... ask on his forum: http://julius.sourceforge.jp/forum).
Ken
--- (Edited on 11/23/2008 10:55 pm [GMT-0500] by kmaclean) ---
Hi ubanov,
Just a clarification ... when I say " run two Julius server's" I mean run two Julius server mode processes on the same PC (not on two different PCs).
Ken
--- (Edited on 11/25/2008 1:24 pm [GMT-0500] by kmaclean) ---
Last week I have been testing a comercial voice recognition software. In this program the two languages issue is handled in the grammar of your application. Then is possible, putting the languaje betwen brackets, to have recognition of some words in one language, and others in another language.
And you will ask, whats the point of having recognition in two languajes. I'm going to use the example that came with the program that I tested.
Let's suppose that we want to build a voice directory for one enterprise. I live in spain, then all the names may be spanish ones. But, in spain I live in the vasc country (then there are a lot of named that are not spanish, and they are vasc). And then suppose that an English person is working in the company... then we will need 3 languajes (because a lot of people knows a little bit in english and may call to the person using english pronunciation of the name...)
May be I need to ask sphinx, or julius developers team to include in the future this functionality...
Thank
--- (Edited on 11/29/2008 6:02 pm [GMT-0600] by ubanov) ---
Hi ubanov,
"then we will need 3 langua[..][g]es" - maybe you should take a look into the PLS. It is possible to have multiple pronunciations for the same orthography. The trick is to use the IPA (and not some Spanish-specific subset of phones/phonemes) so that you can have some foreign words (e.g. foreign names) in your Spanish dictionary. So a specific foreign name (e.g. "Newton") could be spelled differently - even with a Spanish accent. How would a Spanish person that doesn't know any English pronounce the word "Newton"? The pronounciation might be wrong. And exactly this wrong pronounciation could be included in your Spanish dictionary.
Unfortunately, Sphinx, HTK, and Julius aren't PLS-compatible. But you can learn a lot from reading the PLS specification. At least, I did. The PLS might give you a push into the right direction to solve your pronounciation problems.
Ken, the idea with two or more Julius (PDF) servers (running on a single machine or theoretically on multiple machines) is a different approach. But in my opinion, it won't work. Why not? Think about the following fact: As far as I know, no commercial speech recognition software is capable to understand multiple languages at the same time (and switch between multiple languages automatically). It is too complicated for the moment. Maybe in a few years or so. But not now.
I think that the PLS could help us with the integration of foreign words into our national pronounciation dictionaries. A lot of words in the German language are loan words - the PLS/IPA allows us to integrate loan words (with foreign phonemes) into the pronounciation dictionary.
Greeting, Ralf
--- (Edited on 2008-11-30 12:56 pm [GMT-0600] by ralfherzog) ---
Hi Ralf,
>Ken, the idea with two or more Julius (PDF) servers [... is a different >approach. But in my opinion, it won't work.
This is the approach used in the CMU LetsGo app to help improve recognition between male and female speakers - I just assumed the same approach should be used in a multi-lingual environment, from their docs:
We use the CMU Sphinx II speech recognizer with gender-specific telephone-quality acoustic models from the Communicator system. The data used for training consists of the CMU Communicator data collected over the last 4 years. We automatically split this data into male and female speech and trained separate models. Both models are then run in parallel and the best is selected. Like others, we have found this improves recognition accuracy.
>no commercial speech recognition software is capable to understand
>multiple languages at the same time
You might be right about dictation speech recognition, but I believe that commercial telephony VoiceXML IVR systems exist. A quick Google shows that Telisma offers multilingual ASR, from their site:
The multi-language recognition engine can handle as many languages as you like, and even within a single call. In particular it can recognise pronunciation errors made by foreign speakers.
I quickly looked at Nuance's site, and didn't see anything, but that is not to say that they don't provide such a service to telcos in multilingual environments....
>I think that the PLS could help us with the integration of foreign words
>into our national pronunciation dictionaries.
I don't think that Sphinx or HTK/Julius (or any other open source speech recognition engine) supports PLS.
Ken
--- (Edited on 12/1/2008 11:28 am [GMT-0500] by kmaclean) ---
--- (Edited on 12/1/2008 11:42 am [GMT-0500] by kmaclean) ---
Hi Ubanov,
>we want to build a voice directory for one enterprise
Aahh... I think you can do this with Sphinx or HTK/Julius with no changes to the code...
I assumed you were working something like a VoiceXML VR app that had menus and selections that would be in different languages...
A directory application that uses proper names to direct callers is a little unique since you usually have to manually create pronunciations for the names... so you might be better off with *one* acoustic model.
Since really what you are looking at are three different pronunciations of the same proper names: Spanish, Vasc and English. You could do this by recording people speaking the names in all three languages, and create a pronunciation dictionary that includes the phonemes for each language (using SAMPA or IPA to help you create unique phonemes).
For HTK/Julius (I am not sure about Sphinx...) you could create a return word with a language tag that your app would parse to determine which language your users use the most (so you can tune things - i.e. get more samples of Vasc if recognition rates are not so good...). For example (these are very made up phonemes...):
UBANOV [UBANOV-EN] uh b ah n oh v
UBANOV [UBANOV-VA] u bh n a vh
UBANOV [UBANOV-ES] ui b ae n i vh
Ken
--- (Edited on 12/1/2008 11:14 am [GMT-0500] by kmaclean) ---
Hi Ubanov,
You might also want to check this paper: Acoustic Modeling and Training of a Bilingual ASR System when a Minority Language is Involved (just google "multilingual ASR" or "bilingual ASR" for more papers)
The two languages they modelled were Spanish and Galician.
One of their conclusions is interesting:
Results show that the amount of training data is a fundamental factor to develop a accurate speech recognition system. Furthermore, it is possible to partially avoid the lack of a sufficiently large database by using a speech database available in another but phonetically similar language.
Ken
--- (Edited on 12/1/2008 2:05 pm [GMT-0500] by kmaclean) ---
--- (Edited on 12/1/2008 2:07 pm [GMT-0500] by kmaclean) ---