VoxForge
Just curious,
Has there been any work on the following: -
Or is this just what 'voice morphing' is?
I'm guessing you lose a lot of prosody, but as you don't need to translate the phenomes into words, just back to different speech it doesn't matter so much if someone speakes something unknown?
- Bundabrg
--- (Edited on 9/10/2009 6:47 pm [GMT-0500] by bundabrg) ---
--- (Edited on 9/10/2009 6:47 pm [GMT-0500] by bundabrg) ---
Hi Bundabrg,
>1. User speaks a sentence (presumably into a speech recognizer [...]
>2. It is recognized into its phenomes (use best guess)
>.3 Using a different voice, the phenomes are translated back to speech.
>[...] but as you don't need to translate the phenomes into words, just
>back to different speech it doesn't matter so much if someone
>speakes something unknown?
Speech recognition engines need context to help them recognize speech. Individual phonemes are difficult to recognize. Phonemes which are grouped together, like in triphones or in words or phrases (assuming a grammar), are easier to recognize because they are more uniquely identifiable - therefore you get better accuracy. So this *might* be a case of "garbage recognized, garbage pronounced"...
Regardless, you could play around with this in limited form by creating a script (Perl, Python, Ruby...) and with Julius/Sphinx and Festival. Just recognize using a simple grammar, get the phonemes from the actual Julius or Sphinx output, and use Festival to pronounce the phonemes (which I believe it can do...)
Let us know how you make out!
thanks,
Ken
--- (Edited on 9/14/2009 4:47 pm [GMT-0400] by kmaclean) ---