Click here to register.

Acoustic Model Discussions

Flat
New 160k words 1080 hours english models released
User: guenter
Date: 10/30/2017 5:01 pm
Views: 90
Rating: 0

After working on the german voxforge model for some time I have now applied the scripts I developed for those models to a combination of the english librispeech and voxforge corpora. The resulting models can be downloaded from:

http://goofy.zamia.org/voxforge/en/

The scripts I am using to build my models can be found on github here:

https://github.com/gooofy/speech

While I took a pretty much manual approach for the german models I decided to try a more or less fully automated approach for the english ones - mostly because a lot of speech model resources are available here (while I had to start pretty much from scratch for the german models). 

The lexicon is based on the CMUdict to which I added missing entries using sequitur g2p (trained on CMUdict).

The audio recordings consist of

 

  •  the "good" librispeech recordings
  •  those same recordings with noise and reverb added to them at random

 

I trained a first kaldi nnet3 model on these recordings and then used this model to decode all the recordings from the english voxforge model and added those recordings to my corpus where the decoding results matched the transcripts. I iterated this process once more (and plan to do more iterations  in the future along with manual reviews).

stats:

159373 lexicon entries.
total duration of all good submissions: 1038:59:40
Kaldi:
%WER 7.30 [ 36196 / 496128, 2226 ins, 16007 del, 17963 sub ] exp/nnet3/nnet_tdnn_a/decode/wer_8_0.0
CMU Sphinx models:
cmusphinx cont model: SENTENCE ERROR: 85.5% (12906/15093)   WORD ERROR RATE: 18.0% (89407/496158)
cmusphinx ptm model: SENTENCE ERROR: 89.2% (13467/15093)   WORD ERROR RATE: 24.2% (120169/496158)
sequitur g2p model:
    total: 13147 strings, 99753 symbols
    successfully translated: 13146 (99.99%) strings, 99746 (99.99%) symbols
        string errors:       4881 (37.13%)
        symbol errors:       9557 (9.58%)
            insertions:      2190 (2.20%)
            deletions:       2422 (2.43%)
            substitutions:   4945 (4.96%)
    translation failed:      1 (0.01%) strings, 7 (0.01%) symbols
    total string errors:     4882 (37.13%)
    total symbol errors:     9564 (9.59%)

--- (Edited on 10/30/2017 5:01 pm [GMT-0500] by guenter) ---

Re: New 160k words 1080 hours english models released
User: kmaclean
Date: 10/31/2017 7:44 am
Views: 26
Rating: 0

>I have now applied the scripts I developed for those models to a

>combination of the english librispeech and voxforge corpora

Very impressive!

Thank you,

ken

--- (Edited on 10/31/2017 8:44 am [GMT-0400] by kmaclean) ---

Next