General Discussion

Flat
How to create reverse 3 gram for julius?
User: scorpioh
Date: 12/20/2006 11:11 pm
Views: 13578
Rating: 28
I am able to create a back-off bigram using HTK with Julius uses for its first pass. But I really have no idea how to create the REVERSE 3 gram needed for the second pass. Could anyone shed some light? Thanks.

--- (Edited on 12/20/2006 11:11 pm [GMT-0600] by Visitor) ---

Re: How to create reverse 3 gram for julius?
User: kmaclean
Date: 12/21/2006 1:04 pm
Views: 1730
Rating: 26

Hi  scorpioh,

For readers that are not familiar with reverse word 3-gram, these are used in the creation of language models for use in Dictation applications.  The Julius Speech Recognition Engine is used for Dictation, whereas its close cousin "Julian" (used in the VoxForge tutorials and howtos) is use for Command and Control and IVR type applications.  Julian uses a Grammar file, not a Language Model.  Dication applications require Acoustic Models trained with much more speech audio data than command and control or IVR applications.

After a quick search of the HTK manual, I could not find reference to reverse word 3-gram, or reverse word trigram, or reverse word n-gram files.

The SRILM - The SRI Language Modeling Toolkit contains training-scripts that touch on the creation of reverse-ngrams.  One of the parameters says:

reverse-text reverses the word order in text files, line-by-line. Start- and end-sentence tags, if present, will be preserved. This reversal is appropriate for preprocessing training data for LMs that are meant to be used with the ngram -reverse option.

I believe SRILM creates ARPA format language models, which should be usable by Julius.

You might also try the CMU-Cambridge Statistical Language Modeling Toolkit.

In addition, you might also want to contact LEE Akinobu (ri at nitech.ac.jp) directly, the main developer of Julius.

Let us know how you make out, so others can benefit from your work.

thanks in advance!

Ken 

 

 

--- (Edited on 12/21/2006 2:04 pm [GMT-0500] by kmaclean) ---

Re: How to create reverse 3 gram for julius?
User: kmaclean
Date: 1/2/2007 8:54 pm
Views: 320
Rating: 16

email from LEE Akinobu (Julius maintainer)

Hi Ken,

You can simply create a reverse 3-gram by the following steps:

1) Reverse all the word orders of the training corpus
2) Train 3-gram in normal way with the reversed corpus

Please note that the training parameter (cut-off, discounting method,
etc.) should
be the same as the forward 2-gram.

Best Regards,

LEE Akinobu

--- (Edited on 1/ 2/2007 9:54 pm [GMT-0500] by kmaclean) ---

Re: How to create reverse 3 gram for julius?
User: kmaclean
Date: 1/2/2007 8:55 pm
Views: 327
Rating: 15
Hi Lee,

Thanks for the quick reply!

I'm sorry, but I am not sure I understand what you mean by reversing the word order of the training corpus. 

The HTK training tutorial has entries like this (from htk-3.3/samples/LMTutorial/train
/abbey_grange.txt):
<s> QUOTE COME WATSON COME QUOTE HE CRIED </s>
<s> THE GAME IS AFOOT </s>
<s> NOT A WORD </s>

Do you mean that we need to reverse the word order of each sentence to look like this:
<s> CRIED HE QUOTE  COME WATSON COME  QUOTE</s>
<s> AFOOT IS GAME THE </s>
<s> WORD A NOT  </s>
and then train 3-gram on this 'reversed' corpus.

-or-

do we do a reverse sort order of the corpus as follows:
<s> NOT A WORD </s>
<s> THE GAME IS AFOOT </s>
<s> QUOTE COME WATSON COME QUOTE HE CRIED </s>

and then train the 3-gram on this reversed sort order corpus.

Thanks for your help,

Ken

--- (Edited on 1/ 2/2007 9:55 pm [GMT-0500] by kmaclean) ---

Re: How to create reverse 3 gram for julius?
User: kmaclean
Date: 1/2/2007 8:56 pm
Views: 575
Rating: 8
> <s> QUOTE COME WATSON COME QUOTE HE CRIED </s>
>  <s> THE GAME IS AFOOT </s>
>  <s> NOT A WORD </s>

I mean completely reverse the word orders from the last of the text file.
The sentences above should be,

</s> WORD A NOT <s>
</s> AFOOT IS GAME THE <s>
</s> CRIED HE QUOTE COME WATSON COME QUOTE <s>

Note that the <s> and </s> should be also reversed.

Since the order of the sentences in a corpus does not affect the
resulting N-gram probabilities, it is enough to reverse the words per
a sentence, and keep the sentence orders like this.

</s> CRIED HE QUOTE COME WATSON COME QUOTE <s>
</s> AFOOT IS GAME THE <s>
</s> WORD A NOT <s>

Regards,

LEE Akinobu

--- (Edited on 1/ 2/2007 9:56 pm [GMT-0500] by kmaclean) ---

Re: How to create reverse 3 gram for julius?
User: kmaclean
Date: 2/28/2008 12:47 pm
Views: 6291
Rating: 19

from lee.akinobu post on Julius forum:

Old Julius-3.x requires two N-gram models to run recognition: forward 2-gram for the first pass, and backward 3-gram for the second pass. The backward 3-gram should be trained from the same corpus with the normal N-gram, with its word order reversed and with the same cut-off value.

Julius-4 can do recognition with only a forward N-gram or a backward N-gram. Word probabilities for reverse direction on each pass are calculated from the given N-gram by Bayes assumption. Since the second pass produces final output, we recommend using backward N-gram.

Their combinations like older versions are still supported, in that case forward 2-gram in the given forward N-gram will be used on the first pass, and the given backward N-gram will be used on the second pass.

--- (Edited on 2/28/2008 1:47 pm [GMT-0500] by kmaclean) ---

PreviousNext