Acoustic Model Discussions

Flat
One word grammar, always recognized?
User: lrascao
Date: 1/25/2007 11:55 am
Views: 15548
Rating: 38

Hi everybody, 

I'm doing a test run with julian with a very simple grammar that contains only one word:

.grammar
S : NS_B SENT NS_E
SENT: NS_B NS_E
SENT: COMMAND_START_V

.voca
% NS_B
<s> sil

% NS_E
</s> sil

% COMMAND_START_V
COMPUTER k ax m p y uw dx er

when i start up julian, no matter what i say he always recognizes the same thing: "COMPUTER", can somebody tell me what am i missing here?

Is there a way for julian to output a confidence score for the recognized words?

thanks!
LR

--- (Edited on 1/25/2007 11:55 am [GMT-0600] by lrascao) ---

Re: One word grammar, always recognized?
User: kmaclean
Date: 1/25/2007 11:20 pm
Views: 3114
Rating: 41

Hi LR,

I assume you've trained our own Acoustic Model to recognize the word  'computer' because I get errors when I try to use it in the VoxForge Acoustic model (one of the triphones that make up the word 'computer' was not trained in the VoxForge Acoustic Model).  So I will use different words in my example.

Julian does have confidence scores, but there are a few things that may influence the outcome of your recognition results. 

I created the following grammar file for testing (and called it sample.grammar):

S : NS_B SENT NS_E
SENT: COMMAND_START_V

I then created a voca file called sample.voca:

% NS_B
<s> sil

% NS_E
</s> sil

% COMMAND_START_V
ACCOUNTING    ax k aw n t ix ng
SCOTTISH      s k aa dx ix sh

and then compiled it using:

mkdfa.pl sample

When I run julian using the following command:

./julian -input mic -C julian.jconf

I get the following results when I utter the word "ACCOUNTING":

pass1_best: <s> SCOTTISH
pass1_best_wordseq: 0 2
pass1_best_phonemeseq: sil | s k aa dx ix sh
pass1_best_score: -1926.307007

length: 55 frames (1.10 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=55
sentence1: <s> ACCOUNTING </s>
wseq1: 0 2 1
phseq1: sil | ax k aw n t ix ng | sil
cmscore1: 1.000 1.000 1.000
score1: -2444.878418
3 generated, 3 pushed, 4 nodes popped in 55

Julian is a two pass speech recognizer.  The first pass has a confidence score of  -1926.307007.  And the second pass has a confidence score of  -2444.878418.   When you utter a word in the grammar file, you get a larger number (ignore the minus sign).  When you utter a word that is not your grammar file, you will get a lower confidence score.

Your problem (of confidence scores not making any sense, and therefore not looking like a confidence score ... ) is likely the result of running Julian without a CMN parameter. 

When you first start up Julian, you should see a notice like this:

------------- System Info end -------------

        ************************************************************
        * NOTICE: The first input may not be correctly recognized *
        *         since no CMN parameter is available on startup.  *
        ************************************************************

This is telling you that Julian takes the cepstral mean of the last 5 seconds of speech as the initial cepstral mean at the beginning of each input.  So Julian looks at the previous 5 seconds of speech to get an average (cepstral mean) in order to recognize speech.  That is why in Julian's default configuration it never recognizes what you say for the first few utterances, as it tries to figure out this average.

You can get around this by using "-cmnsave filename"  to record a representative average for your environment, and then use "-cmnload filename" and "-cmnnoupdate" to use then cmn you saved and not try to recalculate it on the fly.  Theoretically your confidence scores should start looking reasonable, and you should be able to determine whether a word is in your grammar or not.

Try this out, and let me know how you make out,

Ken 

 

 

--- (Edited on 1/26/2007 12:20 am [GMT-0500] by kmaclean) ---

Re: One word grammar, always recognized?
User: kmaclean
Date: 1/28/2007 10:32 am
Views: 3537
Rating: 30

Hi LR,

OK, after a bit of experimentation, its not because of the  CMN parameter. 

It seems like Julian returns the best fit for whatever grammar you have, and if you have a one word grammar that means returning that single word every time.

This might be because the VoxForge Acoustic Models are not complete enough, or it might mean that you cannot use a single word grammar.  I am not sure, and have emailed the Julius/Julian maintainer for some clarification on this this.

As a workaround, you might try creating a grammar with a few Out-of-Vocabulary words, in addition to the word you want to recognize, and then use your application to determine if your target word gets recognized.  Try something like this:

.grammar
S : NS_B SENT NS_E
SENT: COMMAND_START_V
SENT: OUT_OF_VOCABULARY


.voca
% NS_B
<s> sil

% NS_E
</s> sil

% COMMAND_START_V
COMPUTER k ax m p y uw dx er

% OUT_OF_VOCABULARY
list of other words & pronunciations to help get the one line grammar correct

You might also use the "-n candidatenum" parameter in Julian to return the top "n" sentence hypotheses.   So if COMPUTER is in the top 3, there is a greater likelyhood that that it was the word uttered.

It might help if you could tell me why you want Julian to recognize only one word.

thanks, 

Ken 

--- (Edited on 1/28/2007 11:32 am [GMT-0500] by kmaclean) ---

Re: One word grammar, always recognized?
User: kmaclean
Date: 1/28/2007 2:38 pm
Views: 429
Rating: 41

More info...

Here is a link to a thread on the Asterisk-Speech-Rec mail list archive (Asterisk is GPL software) discussing the Lumenvox Speech Recognition Engine (Luminvox is proprietary software).  They discuss how their Speech Recognition Engine returns recognition results. In the post they state:

The way the [LumenVox] Speech Engine works is such that it only ever returns
results that are in-grammar. Even if you said "Maybe" into a yes/no
grammar, the Engine would try and match it with one of your grammar
words (it would just come back with a very low confidence score).

So it may be that the problem we have here is that we do not have a robust enough Acoustic Model, and this is contributing to the strange confidence scores that Julius is returning.

Ken

 

--- (Edited on 1/28/2007 3:38 pm [GMT-0500] by kmaclean) ---

Re: One word grammar, always recognized?
User: lrascao
Date: 1/31/2007 9:22 am
Views: 3892
Rating: 28

Hi kmaclean,

thanks for your help, sorry for the delay in this reply, i have been away for a while, shortly after posting the question it occurred to me the problem that you mentioned about the grammar, i inserted more words other than COMPUTER and now julian returns a RECOGFAIL (module mode) when i say some gibberish. I guess there's no way around this, my guess is : the engine will try to match the input with the elements in the grammar, when there is only one element then it doesn't even bother to make a match, it just returns that one element with a CM score of 1.0, probably due to my accoustic model (it's not the voxforge's one).

ps. thanks for the cmnsave tip, i will surely use it from now on 

--- (Edited on 1/31/2007 9:22 am [GMT-0600] by lrascao) ---

PreviousNext