Acoustic Model Discussions

Flat
MFCC format of Julius 4.0
User: xmuasrer
Date: 3/10/2008 8:33 am
Views: 8295
Rating: 19

Dear Voxforge developers:

I encountered a problem that when i use the MFCC_0_D_A_Z as the feature parameter type,the recognition performance of the mic live input is degraded compared to that using the MFCC_0_D_N_Z ? 

But the performance of mfcc file input which using MFCC_0_D_A_Z is greater than that which using MFCC_0_D_N_Z.

Why? Is it that any  other Configuration option must be set ?

--- (Edited on 3/10/2008 8:33 am [GMT-0500] by xmuasrer) ---

Re: MFCC format of Julius 4.0
User: nsh
Date: 3/10/2008 12:34 pm
Views: 219
Rating: 26

>Why?

If you are asking about VoxForge model, your statement is not true or you just have unsufficient data to estimate performance on batch processing.

If you retrain model with different feature set, it depends on many factors. CMN initial values for example, actually the only significant difference between live and batch recognition is CMN calculation. Second feature set could be much resistant to incorrect mean estimate. So first of all you can try to experiment with cmn in the decoder but if it will be useless you can just leave it as is. The gain will not be big for sure. 

--- (Edited on 3/10/2008 12:34 pm [GMT-0500] by nsh) ---

Re: MFCC format of Julius 4.0
User: xmuasrer
Date: 3/10/2008 9:44 pm
Views: 257
Rating: 25

I just used WSJ0 corpora to train the acoustic model model. For the wav file, I firstly codeed them into MFCC_0_D format and then use hte HComp(HTK3.4)command to convert them to the correct MFCC_0_D_A_Z feature format. And then using these mfcc file to train the model.

For the bath recognition , I can get a satisfied result with MFCC_0_D_A_Z. But the live recogniton result is very bad . The segment of the information is as follow:

Speech input:
     speech input source = microphone
           sampling freq. = 16000 Hz
          threaded A/D-in = not supported (live input may be dropped)
    zero frames stripping = on
          silence cutting = on
              level thres = 2000 / 32767
          zerocross thres = 60 / sec.
              head margin = 300 msec.
              tail margin = 400 msec.
     long-term DC removal = off
       reject short input = off

----------------------- System Information end -----------------------

 *************************************************************
 * NOTICE: The first input may not be recognized, since      *
 *         no initial CMN parameter is available on startup. *
 * for MFCC01*
 *************************************************************

------
### read waveform input
pass1_best:  AT DAY
pass1_best_wordseq: <s> AT DAY </s>
pass1_best_phonemeseq: sil | ae t | d ey | sil
pass1_best_score: -4176.718262
### Recognition: 2nd pass (RL heuristic best-first)
WARNING: 00 _default: hypothesis stack exhausted, terminate search now
STAT: 00 _default: 0 sentences have been found
WARNING: 00 _default: got no candidates, output 1st pass result as a final result
STAT: 00 _default: 850 generated, 850 pushed, 185 nodes popped in 95
<search failed>
STAT: skip CMN parameter update since last input was invalid

And  does not real time processing use the previous input to calculate the current CMN parameters? if does not ,how to CMN initial values ? I also thought that the reason might be the CMN calculation. But I cann't find a fit way to resolve it.

--- (Edited on 3/10/2008 9:44 pm [GMT-0500] by xmuasrer) ---

Re: MFCC format of Julius 4.0
User: kmaclean
Date: 3/11/2008 7:56 am
Views: 295
Rating: 20

Hi xmuasrer,

>if does not ,how to CMN initial values ?

from the Julius manual:

     -norealtime
              Explicitly  specify whether real-time (pipeline) processing will
              be done in the first pass or not.  For file input,  the  default
              is  OFF  (-norealtime),  for  microphone,  adinnet  and NetAudio
              input, the default is ON (-realtime).  This  option  relates  to
              the  way  CMN is performed: when OFF, CMN is calculated for each
              input using cepstral mean of the whole input.  When the realtime
              option is ON, MAP-CMN will be performed.  When MAP-CMN, the cep-
              stral mean of last 5 seconds are used as  the  initial  cepstral
              mean  at the beginning of each input.
  Also refer to "-progout".

   -cmnsave filename
              Save last CMN parameters computed while recognition to the spec-
              ified  file.   The  parameters will be saved to the file in each
              time a input is recognized, so the output file always keeps  the
              last  CMN  parameters.  If output file already exist, it will be
              overridden.
 
       -cmnload filename
              Load initial CMN parameters previously saved in a file by "-cmn-
              save".   Loading  an initial CMN enables Julius to better recog-
              nize the first utterance on a microphone / network input.
   Also
              see "-cmnnoupdate".
 
       -cmnmapweight
              Specify weight of initial cepstral mean at the beginning of each
              utterance for microphone / network input.  Specify larger  value
              to  retain  the  initial  cepstral mean for a longer period, and
              smaller value to rely more  on  the  current  input.   (default:
              100.0)
 
       -cmnnoupdate
              When microphone / network input, this option makes engine not to
              update the cepstral mean at each input and force engine  to  use
              the initial cepstral mean given by "-cmnload" parmanently. 

However, based on the console output from the startup of Julius, this might be your problem:

    threaded A/D-in = not supported (live input may be dropped)

You might try the Julius forum to get more info on this message. 

Ken 

--- (Edited on 3/11/2008 8:56 am [GMT-0400] by kmaclean) ---

Re: MFCC format of Julius 4.0
User: xmuasrer
Date: 3/11/2008 8:50 am
Views: 2814
Rating: 21

Thany you for you reply!

After I analyzed the related code of Julius,I found the porblem "threaded A/D-in = not supported (live input may be dropped)" maybe that I don't use phtread.

--- (Edited on 3/11/2008 8:50 am [GMT-0500] by xmuasrer) ---

PreviousNext