General Discussion

Flat
How do someone create Language Model Julius/Julian
User: sanyaade
Date: 10/22/2006 1:37 pm
Views: 17598
Rating: 25

Hi All,

I started playing with Julius speech kit and everything worked very well. 
Then I downloaded Julius GUI to use it with the tutorial from Voxforge, 
the kit failed to run due to missing language model. 

I will like to use Julius/Julian in GUI for my research and to bring the kit ordinary end user, so I need you guys' help.

1.) Is there any document on how-to create language models for Julius/Julian?

2.) How can someone convert language from Sphinx2 and Sphinx3 to Julius/Julian format? (I have used the corpus from Voxforge (Tutorial: Create Acoustic Model)to generate language model with CMU Quick SLM kit but Julian did not recognise it)

Any help, pointers and directions will be highly appreciated. Thanks in advanced for given me your valuable time and efforts.

God blesses!!!

Best regards, Sanyaade

--- (Edited on 10/22/2006 1:37 pm [GMT-0500] by sanyaade) ---

--- (Edited on 10/22/2006 1:42 pm [GMT-0500] by sanyaade) ---

--- (Edited on 10/22/2006 1:44 pm [GMT-0500] by sanyaade) ---

--- (Edited on 10/22/2006 1:47 pm [GMT-0500] by sanyaade) ---

--- (Edited on 10/22/2006 2:02 pm [GMT-0500] by sanyaade) ---

Re: How do someone create Language Model Julius/Julian
User: kmaclean
Date: 10/22/2006 4:07 pm
Views: 355
Rating: 16

Hi Sanyaade, 
>1.) Is there any document on how-to create language models for Julius/Julian?
Yes there is, the HTK Speech Recognition Toolkit has a very detailed manual - see this link on the HTK site.Note: you
need to register with HTK. take a look at the introduction in chapters 14 and the tutorial on building language models
in Chapter 15.
>2.) How can someone convert language from Sphinx2 and Sphinx3 to Julius/Julian format? 
Julius/Julian and the Sphinx Group of Speech Recognition Engines both use ARPA standard format Language Models. The
problem is that Sphinx Group Language models are compiled into binary form, and Julius cannot read this compiled format.
(Julius has its own compiled format). So you need the "source" used to create our own language model. The current
focus of VoxForge is on collecting speech audio for Acoustic Models, once we get 140 hours of speech audio (the target
for release one of the VoxForge acoustic models), we will start looking at collecting "source" data to create free GPL
Language Models.
Hope that helps,
Ken 
 

 

--- (Edited on 10/22/2006 8:31 pm [GMT-0400] by kmaclean) ---

Re: How do someone create Language Model Julius/Julian
User: sanyaade
Date: 10/23/2006 7:16 am
Views: 202
Rating: 1

Thanks Ken,

I thought as much on why Julius will not read
Sphinx language models. I will checkout on HTK.

I am a registered user on the site and see how it goes
on the tutorial and building the kit on those page you
pointed out to me

Great man and God blesses!!!
Best regards,
Sanyaade

--- (Edited on 10/23/2006 7:16 am [GMT-0500] by sanyaade) ---

--- (Edited on 10/23/2006 7:17 am [GMT-0500] by sanyaade) ---

--- (Edited on 10/23/2006 7:18 am [GMT-0500] by sanyaade) ---

Re: How do someone create Language Model Julius/Julian
User: trevarthan
Date: 4/1/2007 1:25 am
Views: 308
Rating: 10

Surely there are some language models out there for download that we can use. I'd personally just like to test julius's dictation mode. I don't care if the corpus is professional quality or open source for the moment. I just want to give it a go.

Would the language model file from this URL work (it's labeled ARPA and doesn't look binary when I downloaded it) ?
http://www.arborius.net/~jphekman/sphinx/full/index.html

--- (Edited on 4/ 1/2007 1:25 am [GMT-0500] by trevarthan) ---

Re: How do someone create Language Model Julius/Julian
User: kmaclean
Date: 4/1/2007 1:38 pm
Views: 2190
Rating: 6

Hi trevarthan,

>Surely there are some language models out there for download that we can use.

I used to think the same about free Speech Corpora for the creation of Acoustic Models ...

>Would the language model file from this URL work ...

It should since it is in ARPA format.  I have not played much with LMs - too busy collecting speech audio.
Julius uses two files for its Language Model: a 2-gram and a reverse 3-gram file.  See the Julius Book (section 3), the Julius manual, and the Julius config file for more info on this.
See this post for more info on how to create a reverse 3-gram.

Licensing

The LM located at the URL you cite has the following Copyright header:

#############################################################################
## Copyright (c) 1996, Carnegie Mellon University, Cambridge University,
## Ronald Rosenfeld and Philip Clarkson
#############################################################################

I did not see any licensing information, so I am not sure what kind of distribution restrictions their might be.  Usually this means you can only use it for personal use, and that you cannot redistribute it (the default under copyright law is that you cannot distribute anything unless you have a license that permits you to do so). 

There is a Hub4 Language Model distributed by LDC.  It's entry is silent on Copyright, so it might be open source. 

If you can get the LM to work with Julius for dictation, I can follow up with CMU with respect to licensing and maybe use it as a starting point for a VoxForge Language Model. 

Ken 

 

--- (Edited on 4/ 1/2007 2:38 pm [GMT-0400] by kmaclean) ---

Re: How do someone create Language Model Julius/Julian
User: kmaclean
Date: 4/2/2007 4:17 pm
Views: 2783
Rating: 16

Email from Sanyaade: 

Jesse wrote:

Would the language model file from this URL work (it's labeled ARPA and doesn't look binary when I downloaded it) ?
http://www.arborius.net/~jphekman/sphinx/full/index.html

REPLY:

No. Why? Because Julius uses bi-gram and Reverse tri-gram for its language model.

I will send you a brief stepwise on how-to create a language model for Julius using another toolkit (this week-end)  You will still need to create acoustic model using same dictionary and vocab.

Stay blessed!!!

Best regards,

Sanyaade

--- (Edited on 4/ 2/2007 5:17 pm [GMT-0400] by kmaclean) ---

PreviousNext