Click here to register.

German

Flat
New 31k words 310 hours german models released
User: guenter
Date: 10/15/2017 8:06 pm
Views: 142
Rating: 2

Just uploaded the latest 20171016 release of the german voxforge models to

http://goofy.zamia.org/voxforge/de/

besides the usual inclusion of all new voxforge submissions this release focuses on noise resistance. 

I have added some 100 hours of noisy recordings (auto-generated by adding random background and foreground noise to existing recordings) and introduced a new "NSPC" noise phoneme and dictionary entry.  

Also, I have further pruned to language model which greatly reduces the nnet3 model's memory footprint.

Please note that is is most lately the last release to feature Kaldi GMM models - nnet3 yields much better results so I don't think it makes sense to spend the compute time to produce the GMM models (drop me a line if you would like those continued).

With the additon of noisy recordings WER rates have degraded somewhat from previous releases (noisy recordings are much harder to decode, after all) - I hope you will still find this new model useful especially in noise/distant-microphone situations.

stats:

31207 lexicon entries.
total duration of all good submissions: 311:33:41
CMU Sphinx models:
cmusphinx cont model: SENTENCE ERROR: 45.2% (3969/8788)   WORD ERROR RATE: 10.4% (10338/99119)
cmusphinx ptm model: SENTENCE ERROR: 38.3% (3365/8788)   WORD ERROR RATE: 10.8% (10665/99119)
Kaldi models:
%WER 11.15 [ 10998 / 98672, 1880 ins, 2614 del, 6504 sub ] exp/tri3b/decode.si/wer_14_0.0
%WER 10.65 [ 10507 / 98672, 1026 ins, 4347 del, 5134 sub ] exp/tri3b_mmi/decode/wer_12_0.0
%WER 10.61 [ 10468 / 98672, 1575 ins, 3286 del, 5607 sub ] exp/tri2b/decode/wer_15_0.0
%WER 10.52 [ 10385 / 98672, 982 ins, 4346 del, 5057 sub ] exp/tri3b_mmi_b0.05/decode/wer_12_0.0
%WER 10.06 [ 9922 / 98672, 1402 ins, 3193 del, 5327 sub ] exp/tri3b_mpe/decode/wer_14_0.0
%WER 7.77 [ 7663 / 98672, 1087 ins, 2457 del, 4119 sub ] exp/tri2b_mpe/decode/wer_13_0.0
%WER 7.32 [ 7225 / 98672, 780 ins, 2688 del, 3757 sub ] exp/tri2b_mmi/decode/wer_12_0.0
%WER 7.32 [ 7221 / 98672, 829 ins, 2631 del, 3761 sub ] exp/tri2b_mmi_b0.05/decode/wer_11_0.0
%WER 7.13 [ 7038 / 98672, 1625 ins, 1737 del, 3676 sub ] exp/tri3b/decode/wer_15_0.0
%WER 3.72 [ 3673 / 98672, 719 ins, 1666 del, 1288 sub ] exp/nnet3/nnet_tdnn_a/decode/wer_11_0.0
sequitur g2p model:
    total: 3118 strings, 36657 symbols
    successfully translated: 3116 (99.94%) strings, 36635 (99.94%) symbols
        string errors:       1263 (40.53%)
        symbol errors:       2822 (7.70%)
            insertions:      980 (2.68%)
            deletions:       985 (2.69%)
            substitutions:   857 (2.34%)
    translation failed:      2 (0.06%) strings, 22 (0.06%) symbols
    total string errors:     1265 (40.57%)
    total symbol errors:     2844 (7.76%)

 

Re: New 31k words 310 hours german models released
User: guenter
Date: 11/9/2017 6:39 pm
Views: 9
Rating: 0

have been experimenting with kaldi 5.2 tdnn-chain models lately and they show very promising results. This is not a complete release, but I have uploaded my latest model called

kaldi-chain-voxforge-de-r20171109.tar.xz

to the usual model download server here:

http://goofy.zamia.org/voxforge/de/

test decode result for this model was:

%WER 1.18 [ 1174 / 99422, 188 ins, 373 del, 613 sub ] exp/nnet3_chain/tdnn_sp/decode_test/wer_9_0.0
also pretty impressive is the decoding speed: I measured 7.67s decode time for a 4.7s wave file on a raspberry pi 3 (!)
Please note that in order to use this model I recommend the latest kaldi-asr 5.2 plus py-kaldi-asr 0.2.0.
Re: New 31k words 310 hours german models released
User: guenter
Date: 11/13/2017 2:21 pm
Views: 10
Rating: 0

another quick update: by reducing the layer size to 250 I managed to create another kaldi nnet3 chain model that achieves near realtime performance on a raspberry pi 3:

[bofh@donald py-kaldi-asr]$ python examples/chain_incremental.py
tdnn_250 loading model...
tdnn_250 loading model... done, took 7.084181s.
tdnn_250 creating decoder...
tdnn_250 creating decoder... done, took 14.327128s.
decoding data/gsp1.wav...
 0.041s:  4000 frames ( 0.250s) decoded.
 0.319s:  8000 frames ( 0.500s) decoded.
 0.643s: 12000 frames ( 0.750s) decoded.
 0.864s: 16000 frames ( 1.000s) decoded.
 1.086s: 20000 frames ( 1.250s) decoded.
 1.312s: 24000 frames ( 1.500s) decoded.
 1.530s: 28000 frames ( 1.750s) decoded.
 1.760s: 32000 frames ( 2.000s) decoded.
 2.133s: 36000 frames ( 2.250s) decoded.
 2.387s: 40000 frames ( 2.500s) decoded.
 2.624s: 44000 frames ( 2.750s) decoded.
 2.840s: 48000 frames ( 3.000s) decoded.
 3.080s: 52000 frames ( 3.250s) decoded.
 3.449s: 56000 frames ( 3.500s) decoded.
 3.682s: 60000 frames ( 3.750s) decoded.
 3.939s: 64000 frames ( 4.000s) decoded.
 4.165s: 68000 frames ( 4.250s) decoded.
 4.375s: 72000 frames ( 4.500s) decoded.
 4.952s: 75200 frames ( 4.700s) decoded.
*****************************************************************
** data/gsp1.wav
** berlin gilt als weltstadt der kultur politik medien und wissenschaften
** tdnn_250 likelihood: 1.71563148499
*****************************************************************
tdnn_250 decoding took     4.96s
[bofh@donald py-kaldi-asr]$ uname -a
Linux donald 4.9.40-v7.1.el7 #1 SMP Tue Aug 8 14:03:02 UTC 2017 armv7l armv7l armv7l GNU/Linux
while WER still looks good:
%WER 1.18 [ 1174 / 99422, 188 ins, 373 del, 613 sub ] exp/nnet3_chain/tdnn_sp/decode_test/wer_9_0.0
%WER 1.57 [ 1563 / 99422, 250 ins, 446 del, 867 sub ] exp/nnet3_chain/tdnn_250/decode_test/wer_8_0.0
both models are available for download here:
http://goofy.zamia.org/voxforge/de/kaldi-chain-voxforge-de-r20171113.tar.xz
Next