VoxForge
Dear Voxforge developers:
I encountered a problem that when i use the MFCC_0_D_A_Z as the feature parameter type,the recognition performance of the mic live input is degraded compared to that using the MFCC_0_D_N_Z ?
But the performance of mfcc file input which using MFCC_0_D_A_Z is greater than that which using MFCC_0_D_N_Z.
Why? Is it that any other Configuration option must be set ?
--- (Edited on 3/10/2008 8:33 am [GMT-0500] by xmuasrer) ---
>Why?
If you are asking about VoxForge model, your statement is not true or you just have unsufficient data to estimate performance on batch processing.
If you retrain model with different feature set, it depends on many factors. CMN initial values for example, actually the only significant difference between live and batch recognition is CMN calculation. Second feature set could be much resistant to incorrect mean estimate. So first of all you can try to experiment with cmn in the decoder but if it will be useless you can just leave it as is. The gain will not be big for sure.
--- (Edited on 3/10/2008 12:34 pm [GMT-0500] by nsh) ---
I just used WSJ0 corpora to train the acoustic model model. For the wav file, I firstly codeed them into MFCC_0_D format and then use hte HComp(HTK3.4)command to convert them to the correct MFCC_0_D_A_Z feature format. And then using these mfcc file to train the model.
For the bath recognition , I can get a satisfied result with MFCC_0_D_A_Z. But the live recogniton result is very bad . The segment of the information is as follow:
Speech input:
speech input source = microphone
sampling freq. = 16000 Hz
threaded A/D-in = not supported (live input may be dropped)
zero frames stripping = on
silence cutting = on
level thres = 2000 / 32767
zerocross thres = 60 / sec.
head margin = 300 msec.
tail margin = 400 msec.
long-term DC removal = off
reject short input = off
----------------------- System Information end -----------------------
*************************************************************
* NOTICE: The first input may not be recognized, since *
* no initial CMN parameter is available on startup. *
* for MFCC01*
*************************************************************
------
### read waveform input
pass1_best: AT DAY
pass1_best_wordseq: <s> AT DAY </s>
pass1_best_phonemeseq: sil | ae t | d ey | sil
pass1_best_score: -4176.718262
### Recognition: 2nd pass (RL heuristic best-first)
WARNING: 00 _default: hypothesis stack exhausted, terminate search now
STAT: 00 _default: 0 sentences have been found
WARNING: 00 _default: got no candidates, output 1st pass result as a final result
STAT: 00 _default: 850 generated, 850 pushed, 185 nodes popped in 95
<search failed>
STAT: skip CMN parameter update since last input was invalid
And does not real time processing use the previous input to calculate the current CMN parameters? if does not ,how to CMN initial values ? I also thought that the reason might be the CMN calculation. But I cann't find a fit way to resolve it.
--- (Edited on 3/10/2008 9:44 pm [GMT-0500] by xmuasrer) ---
Hi xmuasrer,
>if does not ,how to CMN initial values ?
from the Julius manual:
-norealtime
Explicitly specify whether real-time (pipeline) processing will
be done in the first pass or not. For file input, the default
is OFF (-norealtime), for microphone, adinnet and NetAudio
input, the default is ON (-realtime). This option relates to
the way CMN is performed: when OFF, CMN is calculated for each
input using cepstral mean of the whole input. When the realtime
option is ON, MAP-CMN will be performed. When MAP-CMN, the cep-
stral mean of last 5 seconds are used as the initial cepstral
mean at the beginning of each input. Also refer to "-progout".-cmnsave filename
Save last CMN parameters computed while recognition to the spec-
ified file. The parameters will be saved to the file in each
time a input is recognized, so the output file always keeps the
last CMN parameters. If output file already exist, it will be
overridden.
-cmnload filename
Load initial CMN parameters previously saved in a file by "-cmn-
save". Loading an initial CMN enables Julius to better recog-
nize the first utterance on a microphone / network input. Also
see "-cmnnoupdate".
-cmnmapweight
Specify weight of initial cepstral mean at the beginning of each
utterance for microphone / network input. Specify larger value
to retain the initial cepstral mean for a longer period, and
smaller value to rely more on the current input. (default:
100.0)
-cmnnoupdate
When microphone / network input, this option makes engine not to
update the cepstral mean at each input and force engine to use
the initial cepstral mean given by "-cmnload" parmanently.
However, based on the console output from the startup of Julius, this might be your problem:
threaded A/D-in = not supported (live input may be dropped)
You might try the Julius forum to get more info on this message.
Ken
--- (Edited on 3/11/2008 8:56 am [GMT-0400] by kmaclean) ---
Thany you for you reply!
After I analyzed the related code of Julius,I found the porblem "threaded A/D-in = not supported (live input may be dropped)" maybe that I don't use phtread.
--- (Edited on 3/11/2008 8:50 am [GMT-0500] by xmuasrer) ---