How to fix with the unicode text while finding the sample.dict file

Comments

User: prithviraj
Date: 3/13/2018 4:23 am

Views: 2369
Rating: 0

Hi,

I doing this for Odia language recognition.In the Step-1, I have taken the sample.grammar file and sample.voca file(contains some Odia words with it's phones).I executed with the mkdfa.jl file.It generated the corresponding sample.dfa, sample.term and sample.dict.But I am not finding any thing readable from sample.dict file(only some "?" symbols are there ).I think it might be the problem of unicode system.

So, How to fix this problem?

Thanks in advance.

Prithviraj

Re: How to fix with the unicode text while finding the sample.dict file

User: kmaclean
Date: 3/13/2018 8:23 am

Views: 0
Rating: 0

>I think it might be the problem of unicode system.

Julius was originally written for Japanese speech recognition, so it should work...

Try building the dictionary with the Julius version of the same program (in perl): mkdfa.pl

Re: How to fix with the unicode text while finding the sample.dict file

User: prithviraj
Date: 3/14/2018 6:06 am

Views: 2
Rating: 0

It (mkdfa.pl) works fine for the sample.grammar and sample.voca given in the tutorial.Not working for our modified sample.grammar and sample.voca file.

sample.grammar

S : NS_B SENT NS_E

SENT: DIGIT

sample.voca

% NS_B

<s> sil

% NS_E

</s> sil

% DIGIT

ଆଠ ଆ ଠ୍ ଅ

ଏକ ଏ କ୍ ଅ

ଚାରି ଚ ଆ ର୍ ଇ

ଛଅ ଛ୍ ଅ ଅ

ତିନି ତ୍ ଇ ନ୍ ଇ

ଦୁଇ ଦ୍ ଉ ଇ

ନଅ ନ୍ ଅ ଅ

ପାଞ୍ଚ ପ୍ ଆ ଞ ଚ୍ ଅ

ଶୂନ ଶ୍ ଊ ନ୍ ଅ

ସାତ ସ୍ ଆ ତ୍ ଅ

sample1.grammar has 2 rules

sample1.voca has 2 categories and 13 words

---

Warning: dfa_minimize not found in the same place as mkdfa.p

Warning: no minimization performed

Now parsing grammar file

Now modifying grammar to minimize states[-1]

Now parsing vocabulary file

Now making nondeterministic finite automaton[1/1]

Error: undefined class "NS_B"

---

no .dfa or .dict file generated

How to fix this problem.

Thanks in advance.

Prithviraj

Re: How to fix with the unicode text while finding the sample.dict file

User: kmaclean
Date: 3/14/2018 7:23 am

Views: 1
Rating: 0

>How to fix this problem.

You might try compiling Julius from source and see if the mkfa program (written in c and called by mkdfa.pl) in the gramtools folder will process the default character encodings of your system.

Re: How to fix with the unicode text while finding the sample.dict file

User: prithviraj
Date: 3/15/2018 12:42 am

Views: 1
Rating: 0

Thanks Kmaclean.

I edited the grammar and voca file in Notepad++ which supports Unicode format.Hence using mkdfa.pl/mkdf.jl, I am able to generate the corresponding dict,dfa and term files.

Prithviraj

Previous • Next •


Username	Password