VoxForge
In this thread we have discussed whether we should (namely me and Rauf) upload our training scripts under GPL so that others could use them. I have said that my script isn't ready for publishing but I have offered to clean it up, add comments etc and Rauf offered to do the same with his.
This would obviously require some effort which would result in yet another HTK training script/wrapper. There is already a plenty of these:
The disadvantage of these is that the code is not always easy to navigate and that the scripts are not necessarilly up to date with new versions of HTK.
My idea is that if we spend time doing a well documented/commented automatic HTK trainer then we should make one that would be distributed through VoxForge webiste and regularly updated. I have some ideas about design of such a script/library so I started this thread to discuss them.
But, before we start such work I want to ask whether it is a good idea. It might involve a lot of work, so, would anyone actually use it? Or would people rather program their own training scripts based on the available tutorials and documentation? If this is the case, then it would be easier to just update the tutorials with new information about HDecode etc (see this thread).
--- (Edited on 23.04.2009 11:10 [GMT+0200] by tpavelka) ---
I also don't know whether it is good or bad idea, but I think tutorials should be updated in any case, and new tutorials should be added, and they should be in same style like those in this website
What I did in my batch file is just that I automated all manual works that are in HTK tutorial, nothing very serious, for example it
* Checks sampling and bitrate of wav files before starting training
* checks lab files' (if there're) content for incorrect phone name
* creates lists for coding wav files, and runs HCopy
* creates list of all mfc files and list of mfc files which have lab files, for running HInit and HRest
* creates word level transcription files
* creates 'sp' model from 'sil'
and etc.
--- (Edited on 4/23/2009 6:20 am [GMT-0500] by Rauf) ---
Hi tpavelka/Rauf,
Sorry for the delay in getting back to you on this, been busy cleaning up the non-English corpora for the migration to Drupal.
tpavelka said:
>I think there are two possible solutions:
>1) What I have is just another HTK training recipe,...
>2) We can add new tutorials.I can help with that, but I need someone
>to do a review because I do a lot of mistakes (or maybe we could
>write it like a wiki?[...]
VoxForge currently uses the WebGUI Content Management system, but I am slowly working on migrating to Drupal (the migration scripts are pretty well completed...). I am working on fixing things on the backend to make this happen.
Regardless, I can create a section on the current VoxForge site where I can let any registered user update content using the same web-based editor you use for posting to a WebGUI forum (tinyMCE). I already have FAQ entry called Editing Content with WebGUI to help you get up to speed.
I can definitely help out with editing. If you guys are still interested (and anyone else) I can set this up quite easily.
With respect to creating "just another training recipe", you point is well taken. If you decide that you are interested in updating the VoxForge Tutorial with more current HTK/Julius info, then I would see the need to also update the training recipe in the Howto. My script was created when I was just learning scripting, so there is lots of room for improvement...
However, if you are more interested in adding new/different tutorial material, like the stuff you listed in your previous emails:
then an updated training recipe may not be required (though it will remain on my todo list...).
Please let me know how you want to proceed.
thanks,
Ken
--- (Edited on 5/3/2009 9:53 pm [GMT-0400] by kmaclean) ---
Hi,
I have not responded because I haven't figured out some kind of a complete solution. So I guess I will just summarise the things I have came up with. I do not claim all these things are correct or complete so use/discard/postpone/discuss however you see fit.
First, regarding the yet another training script: I will soon leave my job at the university and I am currently trying to document/clean up my work in ASR. If a HTK training script comes out of this effort I will GPL it and post a link and you can mirror it on the VoxForge website If you want.
I have checked the tutorials and they are more extensive than I thought. Pretty much everything important is there. Some of the information is out of date due to new versions of HTK/Julius but this can be fixed. The hard part is keeping it consistent because the tutorials are aimed at speech that someone can record by themselves. In that case doing the LVCSR stuff (HDecode, cross-word triphones) is useless, because one person is not very likely to record enough data for that.
One idea I have come up with is to use the comments on the tutorial pages for tips & tricks (such as the one about generating all possible triphones for testing), like it is done e.g. in the PHP manual pages. Right now many of the comments there are people asking for help which is, in my oppinion in the wrong place. If someone wants help he usually needs the answer ASAP so it is desirable for as many people as possible to see it. For this reason I think requests for help should only be posted to forums because not that many people regularly visit the tutorial pages. The question is how to force them to do so.
Another kinds of posts are errors in the tutorial pages, I thing these should be moved/hidden once the errors are resolved, so that the tips & tricks are at the top of the comments which is the part people read the most. The isses that come out due to new versions of HTK tools can also be posted in the tips & tricks part.
One more thing that could improve navigation would be to make a table of contents for all the tutorial pages (with the steps described in full, not just "step 1", "step 2" etc. as is done in the sidebar for the lack of space). Maybe post a link to this from each of the tutorial pages.
I know that since you are migrating to Drupal, the page's organization might change so these are just ideas about what could be done.
--- (Edited on 13.05.2009 11:19 [GMT+0200] by tpavelka) ---
Hi tpvelka,
>If a HTK training script comes out of this effort I will GPL it and post a
>link and you can mirror it on the VoxForge website If you want.
It can't hurt to GPL your code even if you don't finish cleaning it up, I may still be able to use bits and pieces when updating the VoxForge AM creation script down the road.
>I have checked the tutorials and they are more extensive than I
>thought. Pretty much everything important is there.
That is good to know. Thank you for reviewing them!
>Some of the information is out of date due to new versions of
>HTK/Julius but this can be fixed.
agree
>In that case doing the LVCSR stuff (HDecode, cross-word triphones)
>is useless, because one person is not very likely to record enough
>data for that.
There might be room for a tutorial on how to adapt the the VoxForge acoustic model to one's own voice to improve recognition, and also to have a step-by-step example of how to create Language Models for dictation type apps, but that is for later...
>use the comments on the tutorial pages for tips & tricks
You point is very valid, though "I am on the fence" on this point. There are pros and cons to either approach...
VoxForge is still a quite small project (unlike PHP) and letting questions appear in the tutorial is (I think...) OK for now. In the long run, I agree, it is best if only tips and tricks are in the tutorials...however, I am just happy that people are actually reading the thing... :)
The migration to Drupal will help in this regard because I can easily create a centralized area where all new comments are displayed, so if there are questions, no matter where they were made (i.e. in the Tutorial or otherwise), they can be easily seen on the Forums page. In the other hand, Drupal has quite extensive tagging facilities, so I think we could display content posted to a forum on a tutorial page using tagging...
If we were to limit postings to tips and tricks on a Tutorial page, how might we guide users to do that? Label it as "user contributed notes" and have a link for questions in a forum?
>Another kinds of posts are errors in the tutorial pages, I thing these
>should be moved/hidden once the errors are resolved,
Agree - I have not been that diligent in this regard...
>improve navigation would be to make a table of contents for all the
>tutorial pages (with the steps described in full
Agree - at one time, you could hover over the sidemenu and have the page title appear, but after an update to WebGUI, this disappeared (I think WebGUI moved to a different Javascript library: YUI)... I never got around to fixing it.
>migrating to Drupal, the page's organization might change so these
>are just ideas about what could be done
Thanks, your feedback is greatly appreciated.
Ken
--- (Edited on 5/13/2009 1:10 pm [GMT-0400] by kmaclean) ---
>I have checked the tutorials and they are more extensive than I
>thought. Pretty much everything important is there.
> That is good to know. Thank you for reviewing them!
Hm, what about MPE/HMMIRest? It's a very important thing now, the one that makes HTK attractive. It would be nice to have a part in tutorial about it. Dr Robinson would be also grateful if it can be added to HTKTimit.sh script.
--- (Edited on 5/13/2009 4:41 pm [GMT-0500] by nsh) ---
Hi nsh,
I have never tried to do discriminative training, even though I agree that it would be a nice thing to have. As I understand it there is a quite a lot of parameters that can be tuned. If we write a tutorial for VoxForge we should first try to find a setup that would lead to better results than the usual stuff with HERest.
I have tried to do the tutorial in the new HTKBook and encounterd some problems ;-)
First I have built a bigram language model with approximatelly the same amount of bigrams as unigrams (this hase been suggested in the tutorial). Here are the counts and perplexity:
ngram 1=14075
ngram 2=13507
perplexity 71.4761, var 11.3587, utterances 43541, words predicted 450872
num tokens 494413, OOV 0, OOV rate 0.00% (excl. </s>)
Next, I have run HDecode, like this:
HDecode.orig.exe -H hmms/my_0_D_N_Z_synthesized/hmmdefs -H hmms/my_0_D_N_Z_synthesized/macros -C config_0_D_N_Z_from_pcm_htklm -S lists/pcm_development_Voxforge_same_as_sphinx.txt -i wlat.den/recout.mlf -t 220.0 -w lm/bigrams_4cutoff.txt -p 0.0 -s 5.0 -o M -z lat -l wlat.den -X lat dict/dict_train hmms/my_0_D_N_Z_synthesized/xwrdtiedlist
And, now the problem:
The speed is around 50xRT with my 3.2GHz P4. On top of that one generated lattice has about 20MB. Given that VoxForge has about 58 hours of speech in 40k files, this would translate to about 100 days of training which would generate about 1TB of lattices (that is if I am counting correctly). This is just Step 3 - Word Lattice Creation as described in the HTKBook.
So, my question is, is this usual, or did I do something wrong?
--- (Edited on 15.05.2009 12:15 [GMT+0200] by tpavelka) ---
@Ken: > use the comments on the tutorial pages for tips & tricks
I agree that the Tips & Tricks section of the tutorials is more like a vision for the future. I brought it up because it somehow solved the problem about what to do with information that does not directly fit into the tutorials (e.g. other ways of doing the same like the post about new triphones generation).
The new Drupal features seem pretty interesting so let's wait and see how that will turn up.
--- (Edited on 15.05.2009 13:55 [GMT+0200] by tpavelka) ---
Hi Tomas
HTK-samples have script that could give some more ideas on what do do with lattice generation. For example the size is indeed the issue, but it's solved by using gzip as a lattice filter.As I understand they also don't have large bigram there, they generate bigram from a prompts of the speaker.
About 50xRT I think the decoder should run at normal speed with usual parameters, that's the idea of MPE training that we get lattices from normal decoding and another set from alignment. For our task it could be 4xRT but certainly not 50x.
--- (Edited on 5/17/2009 9:22 am [GMT-0500] by nsh) ---
Hi,
I wanted to try something with discriminative training, but unfortunatelly I have been too busy with finishing my PhD and looking for a new job. However, I am still planning to clean up my training script and GPL it as I have promised. I will post it once it is done.
Tomas
--- (Edited on 15.06.2009 09:44 [GMT+0200] by tpavelka) ---