VoxForge
Hi,
To answer your question first: no there's no italian section yet. VoxForge was started in the English language and is still quite young.
However, this seems to be a recurring issue (and for good reasons) so I think it wouldn't hurt to talk about how to eventually add other languages. I think it would be great to have a recipe that explains all the requirements for adding other languages to the project.
People whose interest lies more in - for instance - italian (you're not the first one to mention italian!) could then already do some preparatory work. By the time Italian truly get's added some of the work has then already been done!
Things that should be in this recipe for sure would be:
It all depends on your personal skills to figure out where to start. Some work might already have been done at for instance a university where they do research on phonetics. So it's wise not to start on a word list immediately, but first search for an existing one!
There is also a lot of info on the VoxForge-website (esp. in the dev section).
Obviously officially adding another language is in the end a decision for Ken (the project founder), since it requires a lot of work in the background!
Robin
--- (Edited on 6/5/2007 4:56 am [GMT-0500] by Robin) ---
Hi, i'm italian too. Reading the tutorial to make my own acoustic model I don't understand how can I create statistical representation of phonemes.
It's clear how to make grammar file, and other tutorial steps, but not how to create the acoustic model.
I would create a simple acoustic model for italian word, it's possible?
I'm a programmer, studying at University of Bologna, and I'm preparing my degree thesis about speech recognition, and I have to make something work on italian world.
Tks
Manuel
Texts are just texts: books, newspapers and so on. In theory they should be free but copyrighted texts are also acceptable. They are required to build language model but it's only required for decoding not for training.
Once you'll have text put them somewhere so I can download them.
To be honest for me it seems easier to train sphinx model than htk one, probably Ken will correct me. So if you'll install sphinx3 and Sphinxtrain I can help you with Italian setup. We have a dictionary and a phoneset. You just need to record small text (say, 200 utterances in wav files). We'll build acoustic model then.
Two points to discuss here:
For dictation applications you need to create a language model and an acoustic model. Language models require very large amounts of text (the 100 Mb that nsh was referring to for example).
For command and control or IVR apps, you use a grammar file (rather than a language model) and an acoustic model. You can have as few as 5 - 10 words in your grammar, but your system will only recognize those words. You do *not* need a large amount of text to do this type of speech recognition. This is the type of system that the VoxForge Tutorial helps you create.
This is based on my knowledge of HTK/Julius. Sphinx might do (and likely does ...) things differently.
Although we have not (yet ...) created a VoxForge language model, which would require 100Mb++ in texts (which would be used to find the probabilities of occurrence of words in different contexts), we do use public domain texts for our prompts - i.e. the prompts that tell users what to say when they want to submit their speech to VoxForge.
In this particular case, VoxForge tries to ensure that we only use non-copyrighted texts (or copyrighted texts with permissible licenses). We want to avoid a situation where we might be required to remove any speech from our corpus because of Copyright issues.
Creating a recording creates a "derivative work" of the original Copyrighted work. Therefore, if the text you are recording is still covered under Copyright, the Copyright holder retains rights to any "derived" works (in this case the recording you made of their work), and can prevent you from making copies or distributing such derived works.
In addition, the original Copyright holder's rights might apply any acoustic models created from speech recordings made from the reading of their Copyrighted work - since these might be considered a derivative work.
Copyright might not apply to the text used in the creation of language models since all you are doing is creating a list of the probabilities of the words in different contexts. However, if you want to include the source text with your language model (as the use of a GPL license would require), and distribute it, then you would be limited to using out-of-copyright texts.
Only a court can say for certain one way or another, so the approach we are taking at VoxForge is a conservative one, and thus we try to avoid using copyrighted works.
There are other options:
you can still create your own texts, and assign them to the public domain, (this would make sense for the creation of new prompts), or
go to the Project Gutenberg site and use some of the out-of-copyright texts they have on their Italian page;
Having said all this, if you don't plan to distribute your texts, then there is not much a Copyright holder can do to stop you.
I don't really know ... Sphinx has come a long way since I last looked at Acoustic Model creation with them.
Regardless, since you will have the source speech audio, you will always have the option of creating HTK acoustic models at a later date.
hope this helps,
Ken
Well, I created simple Italian structure for sphinxtrain. You can download it from:
http://nshmyrev.narod.ru/temp/voxforge_it_sphinx.tar.gz
The further work is very easy:
1. Put more wave files in wav subdir
2. Update fileids
3. Update transcription
4. Update dictionary if required
5. Run ./scripts.pl/make_feats -ctl etc/voxforge_it_train.fileids
6. Run ./scripts.pl/RunAll.pl
Hi, I'm trying to install Sphinx 3, to use your italian acoustic model, but when I start PERL script decode/slave.pl, it give me an error, because it can't find script_pl/util/utils.pl
This file there isn't in my directory, can you help me?
Tks
Manuel