VoxForge
Dear Ken, I have trained hmm models using HTK.
After having hmm12 diretory I decided to test with julius to recognize the against some test files.
I run the command:
julian -input rawfile -filelist testWavsList -smpFreq 48000 -C julian.jconf> recognitionOutput
and get the folloing output:
include config: julian.jconf
###### check configurations
###### initialize input device
###### build up system
Reading in HMM definition...(ascii)...limit check passed
defined HMMs: 316
logical names: 316
base phones: 42 used in logical
done
Making pseudo bi/mono-phone for IW-triphone...305 added as logical...done
Reading in dictionary...
8 words...done
Reading in DFA grammar...done
Mapping dict item <-> DFA terminal (category)...done
Building HMM lexicon tree........90 nodes
coordination check passed
done
Generating addlog table...1953 kb...done
All init successfully done
### read waveform input
### speech analysis (waveform -> MFCC)
### Recognition: 1st pass (LR beam with word-pair grammar)
Error: 428th frame: no nodes left in beam! model mismatch or wrong input?
### Recognition: 2nd pass
(RL heuristic best-first with DFA)
Any idea what this means:
Error: 428th frame: no nodes left in beam! model mismatch or wrong input?
I get the same error and respectively the same recognition result for all my test files. Namely - I got a sentence consisting of the word DIAL only, which is obviously wrong.
>After having hmm12 diretory I decided to test with julius to recognize
>the against some test files.
I've never tried this... I have tried against the hmm9 directory from Step 8 (monophone acoustic models), and hmm15 directory hmms from Step 10, and had no problems (though the hmm9 models were not that accurate...)
>### Recognition: 1st pass (LR beam with word-pair grammar)
>Error: 428th frame: no nodes left in beam! model mismatch or wrong
>input?
have you tried modifying the beam width parameters in Julius:
-b beamwidth
Beam width (number of HMM nodes) on the first pass. This
value
defines search width on the 1st pass, and has great
effect on
the total processing time. Smaller width will
speed up the
decoding, but too small value will
result in a substantial
increase of recognition errors due to search
failure. Larger
value will make the search stable and will lead to failure-free
search, but processing time and memory usage will grow in
pro-
portion to the width.
Default value is acoustic model dependent:
400 (monophone)
800 (triphone,PTM)
1000 (triphone,PTM, setup=v2.1)
Or why not just finish the acoustic model training and use tied-state triphones?