VoxForge
Hello!
I'm trying to test Julius and Julius-voxforge, and when I try to start the recognition, julius runs but I don't receive any recognition, when it starts I got the next messages, warnings and errors:
### read waveform input
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created
The program runs, but while I'm speaking, and I got these warnings:
Warning: strip: sample 287-302 has zero value, stripped
Warning: strip: sample 32-47 has zero value, stripped
Warning: strip: sample 251-266 has zero value, stripped
Warning: strip: sample 497-512 has zero value, stripped
Warning: strip: sample 563-579 has zero value, stripped
Warning: strip: sample 53-68 has zero value, stripped
Warning: strip: sample 196-212 has zero value, stripped
Warning: strip: sample 341-356 has zero value, stripped
Warning: strip: sample 606-621 has zero value, stripped
and after a time, I got this warning:
WARNING: adin_thread_process: too long input (> 320000 samples), segmented now
Warning: input buffer overflow: some input may be dropped, so disgard the input
and the program continues in an infinite loop.
Also, I run the program with the command -record and if I open the file with VLC, i can hear the sound fine.
Does anybody have idea what can I do to run it?
Can anybody help me, please?
Thanks!
Diego
Output:
********************************************************************
Output:STAT: include config: julian.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found: "sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: init_phmm: defined HMMs: 8002
Stat: init_phmm: loading ascii hmmlist
Stat: init_phmm: logical names: 9406 in HMMList
Stat: init_phmm: base phones: 44 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 1085 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
STAT: reading [sample.dfa] and [sample.dict]...
Stat: init_voca: read 25 words
STAT: done
STAT: Gram #0 sample registered
STAT: Gram #0 sample: new grammar loaded, now mash it up for recognition
STAT: Gram #0 sample: extracting category-pair constraint for the 1st pass
STAT: Gram #0 sample: installed
STAT: Gram #0 sample: turn on active
STAT: grammar update completed
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 313 nodes
STAT: coordination check passed
STAT: multi-gram: beam width set to 200 (guess) by lexicon change
STAT: wchmm (re)build completed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done
Input speech data will be stored to = ./grabaciones/
STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.2.1 (fast)
Engine specification:
- Base setup : fast
- Supported LM : DFA, N-gram, Word
- Extension :
- Compiled by : gcc -g -O2 -fPIC -fPIC
------------------------------------------------------------
Configuration of Modules
Number of defined modules: AM=1, LM=1, SR=1
Acoustic Model (with input parameter spec.):
- AM00 "_default"
hmmfilename=/usr/share/julius-voxforge/acoustic/hmmdefs
hmmmapfilename=/usr/share/julius-voxforge/acoustic/tiedlist
Language Model:
- LM00 "_default"
grammar #1:
dfa = sample.dfa
dict = sample.dict
Recognizer:
- SR00 "_default" (AM00, LM00)
------------------------------------------------------------
Speech Analysis Module(s)
[MFCC01] for [AM00 _default]
Acoustic analysis condition:
parameter = MFCC_0_D_N_Z (25 dim. from 12 cepstrum + c0, abs energy supressed with CMN)
sample frequency = 16000 Hz
sample period = 625 (1 = 100ns)
window size = 400 samples (25.0 ms)
frame shift = 160 samples (10.0 ms)
pre-emphasis = 0.97
# filterbank = 24
cepst. lifter = 22
raw energy = False
energy normalize = False
delta window = 2 frames (20.0 ms) around
hi freq cut = OFF
lo freq cut = OFF
zero mean frame = OFF
use power = OFF
CVN = OFF
VTLN = OFF
spectral subtraction = off
cepstral normalization = real-time MAP-CMN
base setup from = Julius defaults
MAP-CMN:
initial cep. data = none
beginning data weight = 100.00
beginning data update = yes, from last inputs at each input
------------------------------------------------------------
Acoustic Model(s)
[AM00 "_default"]
HMM Info:
8002 models, 5950 states, 5950 mpdfs, 5950 Gaussians are defined
model type = context dependency handling ON
training parameter = MFCC_N_D_Z_0
vector length = 25
number of stream = 1
stream info = [0-24]
cov. matrix type = DIAGC
duration type = NULLD
max mixture size = 1 Gaussians
max length of model = 5 states
logical base phones = 44
model skip trans. = exist, require multi-path handling
skippable models = sp (1 model(s))
AM Parameters:
Gaussian pruning = safe (-gprune)
top N mixtures to calc = 2 / 0 (-tmix)
short pause HMM name = "sp" specified, "sp" applied (physical) (-sp)
cross-word CD on pass1 = handle by approx. (use max. prob. of same LC)
sp transition penalty = -70.0
------------------------------------------------------------
Language Model(s)
[LM00 "_default"] type=grammar
DFA grammar info:
9 nodes, 19 arcs, 11 terminal(category) symbols
category-pair matrix: 104 bytes (1216 bytes allocated)
Vocabulary Info:
vocabulary size = 25 words, 85 models
average word len = 3.4 models, 10.2 states
maximum state num = 24 nodes per word
transparent words = not exist
words under class = not exist
Parameters:
found sp category IDs =
------------------------------------------------------------
Recognizer(s)
[SR00 "_default"] AM00 "_default" + LM00 "_default"
Lexicon tree:
total node num = 313
root node num = 23
leaf node num = 25
(-penalty1) IW penalty1 = +5.0
(-penalty2) IW penalty2 = +20.0
(-cmalpha)CM alpha coef = 0.050000
inter-word short pause = on (append "sp" for each word tail)
sp transition penalty = -70.0
Search parameters:
multi-path handling = yes, multi-path mode enabled
(-b) trellis beam width = 200 (-1 or not specified - guessed)
(-bs)score pruning thres= disabled
(-n)search candidate num= 1
(-s) search stack size = 500
(-m) search overflow = after 2000 hypothesis poped
2nd pass method = searching sentence, generating N-best
(-b2) pass2 beam width = 200
(-lookuprange)lookup range= 5 (tm-5 <= t <tm+5)
(-sb)2nd scan beamthres = 200.0 (in logscore)
(-n) search till = 1 candidates found
(-output) and output = 1 candidates out of above
IWCD handling:
1st pass: approximation (use max. prob. of same LC)
2nd pass: loose (apply when hypo. is popped and scanned)
all possible words will be expanded in 2nd pass
build_wchmm2() used
lcdset limited by word-pair constraint
short pause segmentation = off
fall back on search fail = off, returns search failure
------------------------------------------------------------
Decoding algorithm:
1st pass input processing = real time, on-the-fly
1st pass method = 1-best approx. generating indexed trellis
output word confidence measure based on search-time scores
------------------------------------------------------------
FrontEnd:
Input stream:
input type = waveform
input source = microphone
device API = default
sampling freq. = 16000 Hz
threaded A/D-in = supported, on
zero frames stripping = on
silence cutting = on
level thres = 2000 / 32767
zerocross thres = 60 / sec.
head margin = 300 msec.
tail margin = 400 msec.
chunk size = 1000 samples
long-term DC removal = off
reject short input = off
----------------------- System Information end -----------------------
*************************************************************
* NOTICE: The first input may not be recognized, since *
* no initial CMN parameter is available on startup. *
* for MFCC01*
*************************************************************
------
### read waveform input
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created
Warning: strip: sample 106-122 has zero value, stripped
Warning: strip: sample 228-244 has zero value, stripped
Warning: strip: sample 351-366 has zero value, stripped
Warning: strip: sample 281-321 has zero value, stripped
Warning: strip: sample 396-411 has zero value, stripped
Warning: strip: sample 532-547 has zero value, stripped
Warning: strip: sample 26-44 has zero value, stripped
Warning: strip: sample 377-396 has zero value, stripped
Warning: strip: sample 415-431 has zero value, stripped
Warning: strip: sample 505-523 has zero value, stripped
Warning: strip: sample 606-623 has zero value, stripped
Warning: strip: sample 251-270 has zero value, stripped
Warning: strip: sample 51-66 has zero value, stripped
Warning: strip: sample 477-492 has zero value, stripped
Warning: strip: sample 619-634 has zero value, stripped
Warning: strip: sample 166-189 has zero value, stripped
Warning: strip: sample 298-318 has zero value, stripped
Warning: strip: sample 414-432 has zero value, stripped
Warning: strip: sample 530-546 has zero value, stripped
Warning: strip: sample 8-23 has zero value, stripped
Warning: strip: sample 604-620 has zero value, stripped
Warning: strip: sample 38-53 has zero value, stripped
Warning: strip: sample 67-83 has zero value, stripped
Warning: strip: sample 154-171 has zero value, stripped
Warning: strip: sample 183-200 has zero value, stripped
Warning: strip: sample 270-286 has zero value, stripped
Warning: strip: sample 351-369 has zero value, stripped
Warning: strip: sample 422-437 has zero value, stripped
Warning: strip: sample 468-487 has zero value, stripped
Warning: strip: sample 511-526 has zero value, stripped
Warning: strip: sample 588-605 has zero value, stripped
Warning: strip: sample 202-218 has zero value, stripped
Warning: strip: sample 259-274 has zero value, stripped
Warning: strip: sample 347-362 has zero value, stripped
Warning: strip: sample 544-559 has zero value, stripped
Warning: strip: sample 29-44 has zero value, stripped
Warning: strip: sample 170-185 has zero value, stripped
Warning: strip: sample 284-299 has zero value, stripped
Warning: strip: sample 294-309 has zero value, stripped
Warning: strip: sample 315-331 has zero value, stripped
Warning: strip: sample 531-546 has zero value, stripped
Warning: strip: sample 116-133 has zero value, stripped
Warning: strip: sample 243-260 has zero value, stripped
Warning: strip: sample 341-358 has zero value, stripped
Warning: strip: sample 413-428 has zero value, stripped
Warning: strip: sample 110-131 has zero value, stripped
Warning: strip: sample 19-34 has zero value, stripped
Warning: strip: sample 152-168 has zero value, stripped
Warning: strip: sample 286-302 has zero value, stripped
Warning: strip: sample 557-572 has zero value, stripped
Warning: strip: sample 55-70 has zero value, stripped
Warning: strip: sample 155-170 has zero value, stripped
Warning: strip: sample 420-437 has zero value, stripped
Warning: strip: sample 337-352 has zero value, stripped
Warning: strip: sample 618-635 has zero value, stripped
Warning: strip: sample 117-135 has zero value, stripped
Warning: strip: sample 254-270 has zero value, stripped
Warning: strip: sample 98-115 has zero value, stripped
Warning: strip: sample 445-460 has zero value, stripped
Warning: strip: sample 580-597 has zero value, stripped
Warning: strip: sample 70-95 has zero value, stripped
Warning: strip: sample 135-150 has zero value, stripped
Warning: strip: sample 202-235 has zero value, stripped
Warning: strip: sample 332-366 has zero value, stripped
Warning: strip: sample 463-498 has zero value, stripped
Warning: strip: sample 595-627 has zero value, stripped
Warning: strip: sample 89-120 has zero value, stripped
Warning: strip: sample 229-256 has zero value, stripped
Warning: strip: sample 590-611 has zero value, stripped
Warning: strip: sample 0-16 has zero value, stripped
Warning: strip: sample 76-92 has zero value, stripped
Warning: strip: sample 121-138 has zero value, stripped
Warning: strip: sample 202-217 has zero value, stripped
Warning: strip: sample 246-264 has zero value, stripped
Warning: strip: sample 329-345 has zero value, stripped
Warning: strip: sample 585-604 has zero value, stripped
Warning: strip: sample 622-646 has zero value, stripped
Warning: strip: sample 63-79 has zero value, stripped
Warning: strip: sample 101-118 has zero value, stripped
Warning: strip: sample 191-209 has zero value, stripped
Warning: strip: sample 230-249 has zero value, stripped
Warning: strip: sample 321-338 has zero value, stripped
Warning: strip: sample 360-378 has zero value, stripped
Warning: strip: sample 451-469 has zero value, stripped
Warning: strip: sample 490-507 has zero value, stripped
Warning: strip: sample 582-599 has zero value, stripped
Warning: strip: sample 61-79 has zero value, stripped
Warning: strip: sample 44-61 has zero value, stripped
Warning: strip: sample 173-191 has zero value, stripped
Warning: strip: sample 304-321 has zero value, stripped
Warning: strip: sample 435-453 has zero value, stripped
Warning: strip: sample 567-585 has zero value, stripped
Warning: strip: sample 138-153 has zero value, stripped
Warning: strip: sample 407-423 has zero value, stripped
Warning: strip: sample 542-559 has zero value, stripped
Warning: strip: sample 606-626 has zero value, stripped
Warning: strip: sample 41-58 has zero value, stripped
Warning: strip: sample 105-125 has zero value, stripped
WARNING: adin_thread_process: too long input (> 320000 samples), segmented now
Warning: input buffer overflow: some input may be dropped, so disgard the input
--- (Edited on 1/30/2012 8:28 pm [GMT-0600] by ) ---
For the message "Warning: strip: sample 287-302 has zero value, stripped" see the command line/config varaible -nostrip "disable stripping off zero samples"
--
Dr Tony Robinson
Founder Cantab Research Ltd
http://www.cantabResearch.com
--- (Edited on 31-January-2012 7:25 am [GMT+0000] by TonyR) ---