VoxForge
The latest 20180611 builds of the german models were trained on 260 hours of training material and thanks to IPA extraction from german wiktionary cover a dictionary of more than 375,000 entries now.
You can find download links to all our models and dicts here:
https://github.com/gooofy/zamia-speech#download
WER results for these models are not comparable to previous releases as we are measuring WERs for speakers not in the training set from now on and also tried to make the language model more neutral (i.e. not over-represent prompts in the training material) so the WER results should give a more realistic assessment of what performance one can expect from our models without adaptation.
WER for the large kaldi model is 6.23% for the large model and 7.49% for the embedded model.
WER for the continuous CMU Sphinx model is 29%.
We have also been quite busy cleaning up our scripts and documentation so it should become easier to understand what we are doing here. The models come complete with example scripts and pre-compiled binary packages for various platforms, more information on that can be found in our getting started guide here:
https://github.com/gooofy/zamia-speech#get-started-with-our-pre-trained-models
Please note that we have changed the tarball format of our models significantly so you will have to use the latest 0.3.1 py-kaldi-asr wrappers with these models. The new tarball format allows for model adaptation
https://github.com/gooofy/zamia-speech#model-adaptation
as well as automatic segmentation and transcript alignment of long audio recordings (e.g. librivox audiobooks):
https://github.com/gooofy/zamia-speech#audiobook-segmentation-and-transcription-kaldi
comments, suggestions and contributions are very welcome. For more information about the zamia-speech project, please visit http://zamia-speech.org/
The latest http://zamia-speech.org german Kaldi ASR Factorized TDNN model looks quite good:
4.25% WER (previous models: 6.23% WER tdnn_sp, 7.49% WER tdnn_250).
Download here:
Hello guenter!
Thanks a lot for your work! But the only question - when I try to use the latest german model with zamia-speech (with a demo script kaldi_decode_wav.py) I always see the error like this:
=================
Traceback (most recent call last):
File "kaldi_decode_wav.py", line 60, in <module>
kaldi_model = KaldiNNet3OnlineModel (options.modeldir, acoustic_scale=1.0, beam=7.0, frame_subsampling_factor=3)
File "kaldiasr/nnet3.pyx", line 134, in kaldiasr.nnet3.KaldiNNet3OnlineModel.__cinit__ (kaldiasr/nnet3.cpp:3549)
RuntimeError
Hello Mr. Smith,
I suspect you're using the Debian packages from zamia-speech.org repositories which up to a few hours ago contained kaldi 5.3 which was too old to support tdnn_f models. I have uploaded kaldi 5.4 debian packages now which should run the new models fine. The new tdnn_f models are therefore also included in the zamia-speech debian model packages so you should find them installed in /opt/kaldi/models if you're using those.
Cheers,
Guenter