VoxForge
Hi All,
I have pre-recorded audio file content whose HTK AM tiedlist and hmmdefs were perefectly generated.
The Julius (-v4.3.1) X.grammar file is as -
--
S : NS_B CONVERSATION NS_E
CONVERSATION: ACTION ADJECTIVE ADVERB AFFIRMATIVE COMMON CONJUNCTION DETERMINER EXCLAMATION INDEFINITE INFINITIVE INTERJECTION MODAL MONTH NEGATIVES NOUN NUMBER PARTICIPLE PERFECT PERSON PREPOSITION PRONOUN SUBJUNCTIVE VERB WEEK YEAR
--
and X.voca file is as -
---
% NS_B
<s> sil
% NS_E
</s> sil
% ACTION
HELP hh eh l p
HOW hh aw
% ADJECTIVE
BAD b ae d
COLD k ow l d
DELINQUENT d ih l ih ng k w ih n t
FOR f er
GOTTEN g aa t en
NOTICE n ow t ih s
OK ow k
TALKED t ao k t
TOTALLY t ow t ax l iy
VERY v eh r iy
% ADVERB
MUCH m ah ch
PERFECTLY p er f ih k t l iy
SO s ow
THAT dh ae t
% AFFIRMATIVE
WAS w ah z
WE w iy
WERE w er
WHEN w eh n
% COMMON
RECORD r eh k er d
RECORDS r eh k er d z
% CONJUNCTION
AND ae n d
BECAUSE b ax k ah z
BUT b ah t
IN ih n
IN ix n
OR ow r
SO s ow
% DETERMINER
MUCH m ah ch
THE dh ah
% EXCLAMATION
% INDEFINITE
A ax
A ey
% INFINITIVE
I'D ay d
% INTERJECTION
YES y eh s
% MODAL
CAN k ae n
% MONTH
DECEMBER d ih s eh m b er
% NEGATIVES
NO n ow
DON'T d ow n t
% NOUN
CALLING k ao l ih ng
CENT s eh n t
CENTS s eh n t s
COMCAST k aa m k ae s t
DELINQUENT d ih l ih ng k w ih n t
DEXTER d eh k s t er
DOLLAR d ao l er
DOLLARS d ao l er z
MAIL m ey l
NAME n ey m
NATALIE n ae t a l iy
NUMBER n ah m b er
PAYMENT p ey m ih n t
SPEAKING s p iy k ih ng
% NUMBER
EIGHT ey t
EIGHTY ey t iy
FOUR f ow r
SIX s ih k s
SIXTY s ih k s t iy
THIRD th er d
THIRTEEN th er t iy n
TWENTY t w eh n t iy
% PARTICIPLE
CALLED k ao l d
% PERFECT
HAD hh ae d
HAVE hh ae v
LAST l ae s t
UNDERSTAND ah n d er s t ae n d
% PERSON
HERE hh iy r
I ay
IS ih z
MY m ay
THIS dh ih s
YOU y uw
% PREPOSITION
TO t ax
FOR f er
ON aa n
% PRONOUN
ME m iy
YOU y uw
IT ih t
THAT dh ae t
MUCH m ah ch
% SUBJUNCTIVE
SITUATION s ih ch uw ey sh ih n
% VERB
CHECK ch eh k
WORRY w er r iy
THINK th ih ng k
LIKE l ay k
HAD hh ae d
MAILED m ey l d
THANK th ae ng k
CAN k ae n
UNDERSTAND ah n d er s t ae n d
% WEEK
FRIDAY f r ay d ey
% YEAR
----
The X.dfa state as -
---
0 1 1 0 0
1 26 2 0 0
2 25 3 0 0
3 24 4 0 0
4 23 5 0 0
5 22 6 0 0
6 21 7 0 0
7 20 8 0 0
8 19 9 0 0
9 18 10 0 0
10 17 11 0 0
11 16 12 0 0
12 15 13 0 0
13 14 14 0 0
14 13 15 0 0
15 12 16 0 0
16 11 17 0 0
17 10 18 0 0
18 9 19 0 0
19 8 20 0 0
20 7 21 0 0
21 6 22 0 0
22 5 23 0 0
23 4 24 0 0
24 3 25 0 0
25 2 26 0 0
26 0 27 0 0
27 -1 -1 1 0
---
and X.term being generated as -
--
0 NS_B
1 NS_E
2 ACTION
3 ADJECTIVE
4 ADVERB
5 AFFIRMATIVE
6 COMMON
7 CONJUNCTION
8 DETERMINER
9 EXCLAMATION
10 INDEFINITE
11 INFINITIVE
12 INTERJECTION
13 MODAL
14 MONTH
15 NEGATIVES
16 NOUN
17 NUMBER
18 PARTICIPLE
19 PERFECT
20 PERSON
21 PREPOSITION
22 PRONOUN
23 SUBJUNCTIVE
24 VERB
25 WEEK
26 YEAR
--
but the Julius output gives as -
$ julius -input rawfile -filelist wavlist -C julian.jconf > JuliusOutput
------
### read waveform input
Stat: adin_sndfile: input speechfile: ../train/wav/X1.wav
Stat: adin_sndfile: input format = Microsoft WAV
Stat: adin_sndfile: input type = Signed 16 bit PCM
Stat: adin_sndfile: endian = file native endian
Stat: adin_sndfile: 48000 Hz, 1 channels
STAT: 2880000 samples (60.00 sec.)
STAT: ### speech analysis (waveform -> MFCC)
### Recognition: 1st pass (LR beam)
pass1_best: <s> HOW GOTTEN PERFECTLY WHEN RECORDS BECAUSE MUCH
pass1_best_wordseq: 0 2 3 4 5 6 7 8
pass1_best_phonemeseq: sil | hh aw | g aa t en | p er f ih k t l iy | w eh n | r eh k er d z | b ax k ah z | m ah ch
pass1_best_score: -476364.281250
### Recognition: 2nd pass (RL heuristic best-first)
WARNING: 00 _default: hypothesis stack exhausted, terminate search now
STAT: 00 _default: 0 sentences have been found
WARNING: 00 _default: got no candidates, search failed
STAT: 00 _default: 0 generated, 0 pushed, 0 nodes popped in 17998
<search failed>
------
Queries:
1. Is my X.grammar and X.voca perfectly fine?
2. How to make sure that the audio format matches the acoustic model format?
3. I have 15 audio files (X1.wav - X15.wav), why in every julius output I have same pass1_best as "<s> HOW GOTTEN PERFECTLY WHEN RECORDS BECAUSE MUCH"?
I am willing to share the datas when asked. Please help.
Thanks ..!!
---
The problem is that in your grammar you are forcing the recognizer to recognizer a sequence as a sentence:
CONVERSATION: ACTION ADJECTIVE ADVERB AFFIRMATIVE COMMON CONJUNCTION DETERMINER EXCLAMATION INDEFINITE INFINITIVE INTERJECTION MODAL MONTH NEGATIVES NOUN
Means that in a sentence first goes ACTION then ADJECTIVE then ADVERB then everything else. If you want alternatives you need to do something like
CONVERSATION: ACTION
CONVERSATION: ADJECTIVE
and so on.
You can read documentation http://julius.sourceforge.jp/en_index.php?q=en_grammar.html to understand grammars in more details.