VoxForge
C:>julius-4.3.1 -input mic -C Sample.jconf
STAT: include config: Sample.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found:
"sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: rdhmmdef: no <SID> embedded
Stat: rdhmmdef: assign SID by the order of appearance
Stat: init_phmm: defined HMMs: 46
Stat: init_phmm: loading ascii hmmlist
Stat: init_phmm: logical names: 489 in HMMList
Stat: init_phmm: base phones: 41 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: m_fusion: force multipath HMM handling by user request
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 364 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
STAT: reading [sample.dfa] and [sample.dict]...
Stat: init_voca: read 18 words
STAT: done
STAT: Gram #0 sample registered
STAT: Gram #0 sample: new grammar loaded, now mash it up for recognition
STAT: Gram #0 sample: extracting category-pair constraint for the 1st pass
STAT: Gram #0 sample: installed
STAT: Gram #0 sample: turn on active
STAT: grammar update completed
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 210 nodes
STAT: coordination check passed
STAT: multi-gram: beam width set to 200 (guess) by lexicon change
STAT: wchmm (re)build completed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done
STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.3.1 (fast)
Engine specification:
- Base setup : fast
- Supported LM : DFA, N-gram, Word
- Extension : NoPThread
- Compiled by : i686-w64-mingw32-gcc -O6 -fomit-frame-pointer
------------------------------------------------------------
Configuration of Modules
Number of defined modules: AM=1, LM=1, SR=1
Acoustic Model (with input parameter spec.):
- AM00 "_default"
hmmfilename=hmm15/hmmdefs
hmmmapfilename=tiedlist
Language Model:
- LM00 "_default"
grammar #1:
dfa = sample.dfa
dict = sample.dict
Recognizer:
- SR00 "_default" (AM00, LM00)
------------------------------------------------------------
Speech Analysis Module(s)
[MFCC01] for [AM00 _default]
Acoustic analysis condition:
parameter = MFCC_0_D_N_Z (25 dim. from 12 cepstrum + c0, abs ener
gy supressed with CMN)
sample frequency = 16000 Hz
sample period = 625 (1 = 100ns)
window size = 400 samples (25.0 ms)
frame shift = 160 samples (10.0 ms)
pre-emphasis = 0.97
# filterbank = 24
cepst. lifter = 22
raw energy = False
energy normalize = False
delta window = 2 frames (20.0 ms) around
hi freq cut = OFF
lo freq cut = OFF
zero mean frame = OFF
use power = OFF
CVN = OFF
VTLN = OFF
spectral subtraction = off
cep. mean normalization = yes, real-time MAP-CMN, updating mean with last 0.0 s
ec. input
initial mean from file = N/A
beginning data weight = 100.00
cep. var. normalization = no
base setup from = Julius defaults
------------------------------------------------------------
Acoustic Model(s)
[AM00 "_default"]
HMM Info:
46 models, 126 states, 126 mpdfs, 126 Gaussians are defined
model type = context dependency handling ON
training parameter = MFCC_N_D_Z_0
vector length = 25
number of stream = 1
stream info = [0-24]
cov. matrix type = DIAGC
duration type = NULLD
max mixture size = 1 Gaussians
max length of model = 5 states
logical base phones = 41
model skip trans. = exist, require multi-path handling
skippable models = sp (1 model(s))
AM Parameters:
Gaussian pruning = safe (-gprune)
top N mixtures to calc = 2 / 0 (-tmix)
short pause HMM name = "sp" specified, "sp" applied (physical) (-sp)
cross-word CD on pass1 = handle by approx. (use max. prob. of same LC)
sp transition penalty = -70.0
------------------------------------------------------------
Language Model(s)
[LM00 "_default"] type=grammar
DFA grammar info:
6 nodes, 6 arcs, 6 terminal(category) symbols
category-pair matrix: 32 bytes (712 bytes allocated)
Vocabulary Info:
vocabulary size = 18 words, 52 models
average word len = 2.9 models, 8.7 states
maximum state num = 15 nodes per word
transparent words = not exist
words under class = not exist
Parameters:
found sp category IDs =
------------------------------------------------------------
Recognizer(s)
[SR00 "_default"] AM00 "_default" + LM00 "_default"
Lexicon tree:
total node num = 210
root node num = 18
leaf node num = 18
(-penalty1) IW penalty1 = +5.0
(-penalty2) IW penalty2 = +20.0
(-cmalpha)CM alpha coef = 0.050000
inter-word short pause = on (append "sp" for each word tail)
sp transition penalty = -70.0
Search parameters:
multi-path handling = yes, multi-path mode enabled
(-b) trellis beam width = 200 (-1 or not specified - guessed)
(-bs)score pruning thres= disabled
(-n)search candidate num= 1
(-s) search stack size = 500
(-m) search overflow = after 2000 hypothesis poped
2nd pass method = searching sentence, generating N-best
(-b2) pass2 beam width = 200
(-lookuprange)lookup range= 5 (tm-5 <= t <tm+5)
(-sb)2nd scan beamthres = 200.0 (in logscore)
(-n) search till = 1 candidates found
(-output) and output = 1 candidates out of above
IWCD handling:
1st pass: approximation (use max. prob. of same LC)
2nd pass: loose (apply when hypo. is popped and scanned)
all possible words will be expanded in 2nd pass
build_wchmm2() used
lcdset limited by word-pair constraint
short pause segmentation = off
fall back on search fail = off, returns search failure
------------------------------------------------------------
Decoding algorithm:
1st pass input processing = real time, on-the-fly
1st pass method = 1-best approx. generating indexed trellis
output word confidence measure based on search-time scores
------------------------------------------------------------
FrontEnd:
Input stream:
input type = waveform
input source = microphone
device API = default
sampling freq. = 16000 Hz
threaded A/D-in = not supported (live input may be dropped)
zero frames stripping = on
silence cutting = on
level thres = 2000 / 32767
zerocross thres = 60 / sec.
head margin = 300 msec.
tail margin = 400 msec.
chunk size = 1000 samples
long-term DC removal = off
long-term DC removal = off
level scaling factor = 1.00 (disabled)
reject short input = off
reject long input = off
----------------------- System Information end -----------------------
Notice for feature extraction (01),
*************************************************************
* Cepstral mean normalization for real-time decoding: *
* NOTICE: The first input may not be recognized, since *
* no initial mean is available on startup. *
*************************************************************
------
### read waveform input
Stat: adin_portaudio: audio cycle buffer length = 256000 bytes
Stat: adin_portaudio: sound capture devices:
1 [MME: Microsoft Sound Mapper - Input]
2 [MME: Microphone (USB Audio Device)]
6 [Windows DirectSound: Primary Sound Capture Driver]
7 [Windows DirectSound: Microphone (USB Audio Device)]
Stat: adin_portaudio: APIs: DirectSound MME
Stat: adin_portaudio: -- DirectSound selected
Stat: adin_portaudio: [Windows DirectSound: Primary Sound Capture Driver]
Stat: adin_portaudio: (you can specify device by "PORTAUDIO_DEV_NUM=number"
Stat: adin_portaudio: try to set default low latency from portaudio: 0 msec
Stat: adin_portaudio: latency was set to 0.000000 msec
<<< please speak >>>