General Discussion

Flat
Julius fails to decode proper word definition.
User: linx
Date: 12/18/2014 8:00 am
Views: 3504
Rating: 1

All,

I am a newbie and executing SRE whose informations are -

1. System Information:
$ "Linux UbuntuServer 3.11.0-26-generic #45-Ubuntu SMP Tue Jul 15 04:02:06 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux" system.

2. The Julius version and configuration is -
$ julius -version
julius -version
JuliusLib rev.4.3.1 (fast)

Engine specification:
- Base setup : fast
- Supported LM : DFA, N-gram, Word
- Extension : WordsInt LibSndFile
- Compiled by : gcc -O6 -fomit-frame-pointer

Library configuration: version 4.3.1
- Audio input
primary A/D-in driver : alsa (Advanced Linux Sound Architecture)
available drivers : alsa oss esd
wavefile formats : various formats by libsndfile ver.1
max. length of an input : 320000 samples, 150 words
- Language Model
class N-gram support : yes
MBR weight support : yes
word id unit : integer (4 bytes)
- Acoustic Model
multi-path treatment : autodetect
- External library
file decompression by : zlib library
- Process hangling
fork on adinnet input : no

Try `-help' for more information.
-----

3. The wavlist file having .wav audio file of 1 minute duration contains -
$ file cima1.wav
cima1.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 48000 Hz

4. The runtime configuration file for Julius (julian.jconf) having the selected enable operations are -
---
-dfa sample.dfa
-v sample.dict
-h hmm15/hmmdefs
-hlist tiedlist
-penalty1 5.0 # first pass
-penalty2 20.0 # second pass
-notypecheck
-iwcd1 max
-gprune safe
-b2 200
-sb 200.0
-spmodel "sp" # HMM model name
-smpFreq 48000 # sampling rate (Hz)
-----

5. The snapshot of Julius command is -
$ julius -input rawfile -filelist wavlist -C julian.jconf > JuliusOutput

STAT: include config: julian.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found: "sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: rdhmmdef: no <SID> embedded
Stat: rdhmmdef: assign SID by the order of appearance
Stat: init_phmm: defined HMMs: 44
Stat: init_phmm: loading ascii hmmlist
Stat: init_phmm: logical names: 285 in HMMList
Stat: init_phmm: base phones: 41 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 174 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
STAT: reading [sample.dfa] and [sample.dict]...
Stat: init_voca: read 93 words
STAT: done
STAT: Gram #0 sample registered
STAT: Gram #0 sample: new grammar loaded, now mash it up for recognition
STAT: Gram #0 sample: extracting category-pair constraint for the 1st pass
STAT: Gram #0 sample: installed
STAT: Gram #0 sample: turn on active
STAT: grammar update completed
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
WARNING: IW-triphone for word end "l-er+ey" not found, fallback to pseudo {l-er}
WARNING: IW-triphone for word end "l-er+ey" not found, fallback to pseudo {l-er}
WARNING: IW-triphone for word end "l-er+f" not found, fallback to pseudo {l-er}
WARNING: IW-triphone for word end "l-er+s" not found, fallback to pseudo {l-er}
WARNING: IW-triphone for word end "l-er+s" not found, fallback to pseudo {l-er}
WARNING: IW-triphone for word end "l-er+th" not found, fallback to pseudo {l-er}
WARNING: IW-triphone for word end "l-er+th" not found, fallback to pseudo {l-er}
WARNING: IW-triphone for word end "l-er+t" not found, fallback to pseudo {l-er}
WARNING: wchmm: no lcdset found for [dh-ah::0008], fallback to [dh-ah]
WARNING: wchmm: no lcdset found for [ah-ch::0008], fallback to [ah-ch]
WARNING: wchmm: no lcdset found for [l-er::0016], fallback to [l-er]
WARNING: wchmm: no lcdset found for [d-ey::0025], fallback to [d-ey]
STAT: lexicon size: 1172 nodes
STAT: coordination check passed
STAT: multi-gram: beam width set to 200 (guess) by lexicon change
STAT: wchmm (re)build completed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: All init successfully done

STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.3.1 (fast)

Engine specification:
- Base setup : fast
- Supported LM : DFA, N-gram, Word
- Extension : WordsInt LibSndFile
- Compiled by : gcc -O6 -fomit-frame-pointer

------------------------------------------------------------
Configuration of Modules

Number of defined modules: AM=1, LM=1, SR=1

Acoustic Model (with input parameter spec.):
- AM00 "_default"
hmmfilename=hmm15/hmmdefs
hmmmapfilename=tiedlist

Language Model:
- LM00 "_default"
grammar #1:
dfa = sample.dfa
dict = sample.dict

Recognizer:
- SR00 "_default" (AM00, LM00)

------------------------------------------------------------
Speech Analysis Module(s)

[MFCC01] for [AM00 _default]

Acoustic analysis condition:
parameter = MFCC_0_D_N_Z (25 dim. from 12 cepstrum + c0, abs energy supressed with CMN)
sample frequency = 48000 Hz
sample period = 208 (1 = 100ns)
window size = 400 samples (8.3 ms)
frame shift = 160 samples (3.3 ms)
pre-emphasis = 0.97
# filterbank = 24
cepst. lifter = 22
raw energy = False
energy normalize = False
delta window = 2 frames (6.7 ms) around
hi freq cut = OFF
lo freq cut = OFF
zero mean frame = OFF
use power = OFF
CVN = OFF
VTLN = OFF

spectral subtraction = off

cep. mean normalization = yes, with per-utterance self mean
cep. var. normalization = no

base setup from = Julius defaults

------------------------------------------------------------
Acoustic Model(s)

[AM00 "_default"]

HMM Info:
44 models, 121 states, 121 mpdfs, 121 Gaussians are defined
model type = context dependency handling ON
training parameter = MFCC_N_D_Z_0
vector length = 25
number of stream = 1
stream info = [0-24]
cov. matrix type = DIAGC
duration type = NULLD
max mixture size = 1 Gaussians
max length of model = 5 states
logical base phones = 41
model skip trans. = exist, require multi-path handling
skippable models = sp (1 model(s))

AM Parameters:
Gaussian pruning = safe (-gprune)
top N mixtures to calc = 2 / 0 (-tmix)
short pause HMM name = "sp" specified, "sp" applied (physical) (-sp)
cross-word CD on pass1 = handle by approx. (use max. prob. of same LC)
sp transition penalty = -1.0

------------------------------------------------------------

Language Model(s)

[LM00 "_default"] type=grammar

DFA grammar info:
28 nodes, 27 arcs, 27 terminal(category) symbols
category-pair matrix: 112 bytes (2752 bytes allocated)

Vocabulary Info:
vocabulary size = 93 words, 347 models
average word len = 3.7 models, 11.2 states
maximum state num = 30 nodes per word
transparent words = not exist
words under class = not exist

Parameters:
found sp category IDs =

------------------------------------------------------------
Recognizer(s)

[SR00 "_default"] AM00 "_default" + LM00 "_default"

Lexicon tree:
total node num = 1172
root node num = 86
leaf node num = 93

(-penalty1) IW penalty1 = +5.0
(-penalty2) IW penalty2 = +20.0
(-cmalpha)CM alpha coef = 0.050000

Search parameters:
multi-path handling = yes, multi-path mode enabled
(-b) trellis beam width = 200 (-1 or not specified - guessed)
(-bs)score pruning thres= disabled
(-n)search candidate num= 1
(-s) search stack size = 500
(-m) search overflow = after 2000 hypothesis poped
2nd pass method = searching sentence, generating N-best
(-b2) pass2 beam width = 200
(-lookuprange)lookup range= 5 (tm-5 <= t <tm+5)
(-sb)2nd scan beamthres = 200.0 (in logscore)
(-n) search till = 1 candidates found
(-output) and output = 1 candidates out of above
IWCD handling:
1st pass: approximation (use max. prob. of same LC)
2nd pass: loose (apply when hypo. is popped and scanned)
all possible words will be expanded in 2nd pass
build_wchmm2() used
lcdset limited by word-pair constraint
short pause segmentation = off
fall back on search fail = off, returns search failure

------------------------------------------------------------
Decoding algorithm:

1st pass input processing = buffered, batch
1st pass method = 1-best approx. generating indexed trellis
output word confidence measure based on search-time scores

------------------------------------------------------------
FrontEnd:

Input stream:
input type = waveform
input source = waveform file
input filelist = wavlist
sampling freq. = 48000 Hz required
threaded A/D-in = supported, off
zero frames stripping = on
silence cutting = off
long-term DC removal = off
long-term DC removal = off
level scaling factor = 1.00 (disabled)
reject short input = off
reject long input = off

----------------------- System Information end -----------------------

Notice for feature extraction (01),
*************************************************************
* Cepstral mean normalization for batch decoding: *
* per-utterance mean will be computed and applied. *
*************************************************************

------
### read waveform input
Stat: adin_sndfile: input speechfile: ../train/wav/cima1.wav
Stat: adin_sndfile: input format = Microsoft WAV
Stat: adin_sndfile: input type = Signed 16 bit PCM
Stat: adin_sndfile: endian = file native endian
Stat: adin_sndfile: 48000 Hz, 1 channels
Warning: strip: sample 4733-4758 has zero value, stripped
Warning: strip: sample 413-429 has zero value, stripped
Warning: strip: sample 238-253 has zero value, stripped
Warning: strip: sample 427-444 has zero value, stripped
Warning: strip: sample 963-979 has zero value, stripped
STAT: 2879906 samples (60.00 sec.)
STAT: ### speech analysis (waveform -> MFCC)
### Recognition: 1st pass (LR beam)
pass1_best: <s> HELP GOTTEN PERFECTLY WHEN RECORDS IN MUCH
pass1_best_wordseq: 0 2 3 4 5 6 7 8
pass1_best_phonemeseq: sil | hh eh l p | g aa t en | p er f ih k t l iy | w eh n | r eh k er d z | ix n | m ah ch
pass1_best_score: -486593.375000
### Recognition: 2nd pass (RL heuristic best-first)
WARNING: 00 _default: hypothesis stack exhausted, terminate search now
STAT: 00 _default: 0 sentences have been found
WARNING: 00 _default: got no candidates, search failed
STAT: 00 _default: 0 generated, 0 pushed, 0 nodes popped in 17997
<search failed>


------
### read waveform input
-----

6. Did performed
$ ./processjuliusoutput.pl JuliusOutput JuliusProcessed

The JuliusProcessed file is -

#!MLF!#
"*/Stat: adin_sndfile: ../train/X1.rec"

7. The command to perform actual recognition -
$ HResults -I words.mlf tiedlist JuliusProcessed
ERROR [+6510] LOpen: Unable to open label file Stat: adin_sndfile: ../train/X1.rec
FATAL ERROR - Terminating program HResults


I do see above COLORED as ERRORS which I fail to UNDERSTAND... <---- QUERY#1

8. The words.mlf content is -
-----
#!MLF!#
"*/X1.lab"
THANK
YOU
SO
MUCH
FOR
CALLING
COMCAST
THIS
IS
DEXTER
SPEAKING
HOW
CAN
I
HELP
YOU
DEXTER
MY
NAME
IS
NATALIE
CAN
YOU
UNDERSTAND
ME
BECAUSE
I
HAVE
A
VERY
BAD
COLD
NO
DON'T
WORRY
I
TOTALLY
UNDERSTAND
YOU
PERFECTLY
YOU
CAN
UNDERSTAND
ME
OK
YES
HERE
IS
THE
SITUATION
OK
I
CALLED
COMCAST
I
THINK
IT
WAS
LAST
FRIDAY
AND
I'D
LIKE
YOU
TO
CHECK
MY
RECORDS
FOR
WHEN
WE
TALKED
BECAUSE
I
HAD
GOTTEN
A
NOTICE
IN
THE
MAIL
THAT
I
WAS
DELINQUENT
IN
MY
PAYMENT
BUT
I
HAD
MAILED
YOU
A
CHECK
FOR
EIGHTY
SIX
DOLLARS
AND
SIXTY
CENTS
ON
DECEMBER
TWENTY
THIRD
TWENTY
THIRTEEN
CHECK
NUMBER
FOUR
SIX
EIGHT
.
-----
which somehow doesn't matches with the Julius O/P FIRST PASS of - "pass1_best: <s> HELP GOTTEN PERFECTLY WHEN RECORDS IN MUCH" <---- QUERY#2

Please HELP for QUERY#1 & QUERY#2.

Thanks ..!!

PreviousNext