VoxForge
--- (Edited on 12/20/2006 11:11 pm [GMT-0600] by Visitor) ---
Hi scorpioh,
For readers that are not familiar with reverse word 3-gram, these are used in the creation of language models for use in Dictation applications. The Julius Speech Recognition Engine is used for Dictation, whereas its close cousin "Julian" (used in the VoxForge tutorials and howtos) is use for Command and Control and IVR type applications. Julian uses a Grammar file, not a Language Model. Dication applications require Acoustic Models trained with much more speech audio data than command and control or IVR applications.
After a quick search of the HTK manual, I could not find reference to reverse word 3-gram, or reverse word trigram, or reverse word n-gram files.
The SRILM - The SRI Language Modeling Toolkit contains training-scripts that touch on the creation of reverse-ngrams. One of the parameters says:
reverse-text reverses the word order in text files, line-by-line. Start- and end-sentence tags, if present, will be preserved. This reversal is appropriate for preprocessing training data for LMs that are meant to be used with the ngram -reverse option.
I believe SRILM creates ARPA format language models, which should be usable by Julius.
You might also try the CMU-Cambridge Statistical Language Modeling Toolkit.
In addition, you might also want to contact LEE Akinobu (ri at nitech.ac.jp) directly, the main developer of Julius.
Let us know how you make out, so others can benefit from your work.
thanks in advance!
Ken
--- (Edited on 12/21/2006 2:04 pm [GMT-0500] by kmaclean) ---
email from LEE Akinobu (Julius maintainer)
Hi Ken,
You can simply create a reverse 3-gram by the following steps:
1) Reverse all the word orders of the training corpus
2) Train 3-gram in normal way with the reversed corpus
Please note that the training parameter (cut-off, discounting method,
etc.) should
be the same as the forward 2-gram.
Best Regards,
LEE Akinobu
--- (Edited on 1/ 2/2007 9:54 pm [GMT-0500] by kmaclean) ---
--- (Edited on 1/ 2/2007 9:55 pm [GMT-0500] by kmaclean) ---
--- (Edited on 1/ 2/2007 9:56 pm [GMT-0500] by kmaclean) ---
from lee.akinobu post on Julius forum:
Old Julius-3.x requires two N-gram models to run recognition: forward 2-gram for the first pass, and backward 3-gram for the second pass. The backward 3-gram should be trained from the same corpus with the normal N-gram, with its word order reversed and with the same cut-off value.
Julius-4 can do recognition with only a forward N-gram or a backward N-gram. Word probabilities for reverse direction on each pass are calculated from the given N-gram by Bayes assumption. Since the second pass produces final output, we recommend using backward N-gram.
Their combinations like older versions are still supported, in that case forward 2-gram in the given forward N-gram will be used on the first pass, and the given backward N-gram will be used on the second pass.
--- (Edited on 2/28/2008 1:47 pm [GMT-0500] by kmaclean) ---