German

Nested
Problems using new german voxforge models
User: Martin112
Date: 1/8/2017 11:46 am
Views: 9643
Rating: 0

Hi,

I just started developing with sphinx4 (version: 5prealpha-snapshot). After some successful testing with the default English model, I tried to use the German voxForge model. I downloaded the file cmusphinx-cont-voxforge-de-r20161117.tar.xz and tried to use it with sphinx4. After starting, the following error occurred:

18:12:20.936 INFO largeTrigramModel    Loading n-gram language model from: file:vox_cont/etc/voxforge.lm.dmp
Exception in thread "main" java.lang.Error: Bad binary LM file magic number: 1701409364, not an LM dumpfile?
    at edu.cmu.sphinx.linguist.language.ngram.large.BinaryLoader.readHeader(BinaryLoader.java:469)
    at edu.cmu.sphinx.linguist.language.ngram.large.BinaryLoader.loadModelLayout(BinaryLoader.java:393)
    at edu.cmu.sphinx.linguist.language.ngram.large.BinaryLoader.<init>(BinaryLoader.java:99)
    at edu.cmu.sphinx.linguist.language.ngram.large.LargeNGramModel.allocate(LargeNGramModel.java:206)
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:334)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:243)
    at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:103)
    at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:164)
    at edu.cmu.sphinx.api.StreamSpeechRecognizer.startRecognition(StreamSpeechRecognizer.java:52)
    at edu.cmu.sphinx.api.StreamSpeechRecognizer.startRecognition(StreamSpeechRecognizer.java:39)
    at de.martin.sphinxtest.TranscriberDemo.test1(TranscriberDemo.java:61)
    at de.martin.sphinxtest.TranscriberDemo.main(TranscriberDemo.java:149)
------------------------------------------------------------------------

The following configuration-code is used:

configuration.setAcousticModelPath("file:vox_cont/model_parameters/voxforge.cd_cont_6000");   
configuration.setDictionaryPath("file:vox_cont/etc/voxforge.dic");       
configuration.setLanguageModelPath("file:vox_cont/etc/voxforge.lm.dmp");

I also tried an older german voxforge-model from 2014. It runs without exceptions.

 

If someone has an idea, where the error lies, I would be grateful for every note.

 

Thanks in advance

Martin

 

 

 

 

Re: Problems using new german voxforge models
User: nsh
Date: 1/8/2017 11:57 am
Views: 4
Rating: 1

This LM is in Trie format, so should have name "voxforge.lm.bin". If you rename the file it should load properly.

Re: Problems using new german voxforge models
User: Martin112
Date: 1/8/2017 12:49 pm
Views: 14
Rating: 0

Thanks a lot! The error does not occur anymore.

Unfortunately another error occurs now when calling recognizer.startRecognition(stream):

 

19:37:34.708 INFO trieNgramModel       Loading n-gram language model from: file:vox_cont/etc/voxforge.lm.bin
2017-01-08 19:37:34 SCHWERWIEGEND de.martin.sphinxtest.TranscriberDemo main null
java.lang.NullPointerException
    at edu.cmu.sphinx.linguist.language.ngram.trie.NgramTrieQuant.setTable(NgramTrieQuant.java:50)
    at edu.cmu.sphinx.linguist.language.ngram.trie.BinaryLoader.readQuant(BinaryLoader.java:95)
    at edu.cmu.sphinx.linguist.language.ngram.trie.NgramTrieModel.allocate(NgramTrieModel.java:225)
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:334)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:243)
    at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:103)
    at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:164)
    at edu.cmu.sphinx.api.StreamSpeechRecognizer.startRecognition(StreamSpeechRecognizer.java:52)
    at edu.cmu.sphinx.api.StreamSpeechRecognizer.startRecognition(StreamSpeechRecognizer.java:39)
    at de.martin.sphinxtest.TranscriberDemo.test1(TranscriberDemo.java:61)
    at de.martin.sphinxtest.TranscriberDemo.main(TranscriberDemo.java:149)

------------------------------------------------------------------------

Do you know a solution for this problem?

 

 

 

 

 

Re: Problems using new german voxforge models
User: nsh
Date: 1/8/2017 1:56 pm
Views: 5
Rating: 1

Ok, the LM is also corrupted.

I repackaged files at

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/German/

download them and try, I verified they work with latest s4.

Re: Problems using new german voxforge models
User: Martin112
Date: 1/8/2017 2:20 pm
Views: 207
Rating: 1

The new files work, no crashes anymore. Thank you very much!

 

Re: Problems using new german voxforge models
User: guenter
Date: 1/15/2017 2:32 pm
Views: 2
Rating: 0

> Ok, the LM is also corrupted.

uh, that sounds bad. what did you do to fix it?

 

Re: Problems using new german voxforge models
User: nsh
Date: 1/15/2017 3:37 pm
Views: 99
Rating: 0

Make sure you are using latest sphinxbase for conversion. Also its better to avoid such a big lm, its useless, you can prune it to the size currently uploaded on the cmusphinx site with not accuracy drawbacks.

 

Re: Problems using new german voxforge models
User: guenter
Date: 1/18/2017 7:16 am
Views: 2
Rating: 0

thanks for the quick reply. what tools/options would you recommend for lm pruning?

 

Re: Problems using new german voxforge models
User: nsh
Date: 1/18/2017 7:26 am
Views: 73
Rating: 0

srilm, you can use something like

     ngram -prune 1e-9 -lm your.lm -write-lm your-pruned.lm

to reduce lm size significantly.

 

Re: Problems using new german voxforge models
User: guenter
Date: 1/18/2017 2:41 pm
Views: 2
Rating: 0

ah, srilm, very cool :)

BTW: I was wondering whether I could or should build just one lm using srilm for both sphinx and kaldi instead of my current approach where I build a separate lm using cmuclmtk for sphinx.

Anyway, I will put this on my TODO list for the next iteration of the german model. Thanks again for your help and comments!

PreviousNext