Re: Contributing Audio and Phonemes to the German Model

German

User: nsh
Date: 2/14/2014 5:19 am

Views: 42
Rating: 1

> I have created a new, updated german audio model for CMU Sphinx (http://goofy.zamia.org/voxforge/de/) - this is based on our submission rating/tagging effort. Currently I am still polishing the model but once it is ready for use I will open a new Thread about it in the german forum. Would you be willing to host the new model on the official voxforge servers somewhere?

It's better to use sphinxbase/sphinxtrain trunk to train such model, it creates significantly more accurate models which are noise robust too.

It's also better to train models with LDA/MLLT transform, they usually are 20% more accurate.

For the models it's also better to provide test results so others could reproduce them and estimate model accuracy. For the accurate test results it's better to have separate speakers in a test set, currently the model is overtrained for the speakers who have majority of the recordings in the db.

Re: Contributing Audio and Phonemes to the German Model

User: guenter
Date: 2/14/2014 1:10 pm

Views: 159
Rating: 0

hey - thanks for all the helpful advice, very much appreciated as I am still new to the field.

> For the models it's also better to provide test results so others could reproduce them and estimate model accuracy.

results are included in the tarball now, also I have uploaded some more statistics here:

http://goofy.zamia.org/voxforge/de/audio-stats.txt

which is basically the result of running all submissions through pocketsphinx_batch again and calc word errors per user.

> It's better to use sphinxbase/sphinxtrain trunk to train such model, it creates significantly more accurate models which are noise robust too.

I never realized the released versions are that old while SVN is so busy. Do you happen to know if they're planning to have a fresh release anytime soon? Anyway, I have downloaded, compiled and installed svn trunk so next model will be built using those versions. will the model I generate still work in the latest stable pocketsphinx release?

> It's also better to train models with LDA/MLLT transform, they usually are 20% more accurate.

interesting! will definitely give that a go - this means basically just follow the instructions given in

http://cmusphinx.sourceforge.net/wiki/ldamllt

> For the accurate test results it's better to have separate speakers in a test set, currently the model is overtrained for the speakers who have majority of the recordings in the db.

here I am not sure how we would implement this given the imbalance in the data we have. I don't want to exclude any of the big submissions as we have only very few of them and they ensure the model is somewhat useful - but then I am left only with very sparse submissions by other speakers which I'd essentially have to use all for the test set to have a reasonable number of test submissions?

at the moment, I am using every 10th submission in the test set - I see you point, but I guess for now it must suffice. once we have larger submissions from other speakers (I do plan to start generating submissions based on librivox material) we should be able to reserve some voices exclusively for the test set.

Re: Contributing Audio and Phonemes to the German Model

User: guenter
Date: 2/15/2014 11:09 am

Views: 95
Rating: 0

new model with LDA/MLLT enabled is uploaded, word error rate dropped to 1.9% ! :)

Re: Contributing Audio and Phonemes to the German Model

User: nsh
Date: 2/20/2014 4:32 pm

Views: 40
Rating: 0

This is a great result, congratulations. I still propose you to exclude your recordings and ralf's recordings from the test set. Use only other speaker in the test set. You still can use yourself in a train set. That will give you probably less attractive numbers but it will be a honest estimate and, more importantly, you will be able to optimize model parameters properly.

Currently a lot of recorded and transcribed data is available online like TED talks, podcasts and librivox book. This data is more practically useful than voxforge recordings. So the biggest return would be from the alignment algorithms like the long audio alignment in sphinx4, not from Voxforge. It's way better to have 3000 hours than 30 hours.

Re: Contributing Audio and Phonemes to the German Model

User: nsh
Date: 2/20/2014 4:38 pm

Views: 108
Rating: 0

> they're planning to have a fresh release anytime soon?

Yes, I'm preparing a fresh release now.

> the model I generate still work in the latest stable pocketsphinx release?

The models will work but it's better to use new code. The updated decoder also has few critical features.

Re: Contributing Audio and Phonemes to the German Model

User: guenter
Date: 3/12/2014 5:20 am

Views: 47
Rating: 0

ken,

I have uploaded another bunch of files to the voxforge FTP server (which should all contain an updated license file :) ) - vould you add them to the german audio corupus?

guenter-20140204-afn.tgz
guenter-20140204-afq.tgz
guenter-20140204-ftr.tgz
guenter-20140204-ofp.tgz
guenter-20140204-xck.tgz
guenter-20140205-afn.tgz
guenter-20140205-afq.tgz
guenter-20140205-qah.tgz
guenter-20140205-xck.tgz
guenter-20140206-afn.tgz
guenter-20140206-ftr.tgz
guenter-20140206-qah.tgz
guenter-20140206-xck.tgz
guenter-20140207-afn.tgz
guenter-20140207-afq.tgz
guenter-20140207-ftr.tgz
guenter-20140207-ofp.tgz
guenter-20140207-qah.tgz
guenter-20140207-vau.tgz
guenter-20140207-xck.tgz
guenter-20140208-ftr.tgz
guenter-20140208-qah.tgz
guenter-20140209-afn.tgz
guenter-20140209-ftr.tgz
guenter-20140209-qah.tgz
guenter-20140209-xck.tgz
guenter-20140211-afn.tgz
guenter-20140211-afq.tgz
guenter-20140211-ftr.tgz
guenter-20140211-ofp.tgz
guenter-20140211-qah.tgz
guenter-20140211-vau.tgz
guenter-20140211-xck.tgz
guenter-20140212-qah.tgz
guenter-20140213-ftr.tgz
guenter-20140213-qah.tgz
guenter-20140213-xck.tgz
guenter-20140214-afn.tgz
guenter-20140214-afq.tgz
guenter-20140214-ftr.tgz
guenter-20140214-ofp.tgz
guenter-20140214-qah.tgz
guenter-20140214-xck.tgz
guenter-20140215-qah.tgz
guenter-20140217-afn.tgz
guenter-20140217-ftr.tgz
guenter-20140217-qah.tgz
guenter-20140217-xck.tgz
guenter-20140218-ftr.tgz
guenter-20140218-qah.tgz
guenter-20140218-xck.tgz
guenter-20140224-ftr.tgz
guenter-20140224-qah.tgz
guenter-20140309-qah.tgz
guenter-20140310-ftr.tgz
guenter-20140310-qah.tgz

thanks,

guenter

Re: Contributing Audio and Phonemes to the German Model

User: Binh
Date: 3/13/2014 4:50 am

Views: 3186
Rating: 0

>That will give you probably less attractive numbers but it will be a honest estimate

I agree. To put those number in relation. We are testing on interviews on conventions. If we test a new video we only add the missing words to the decoding dictionary.

We moving around word error rates of 90%.

This number seems high but you have to take into account that convention videos is one of the worst possible candidates.

The audio is hardly "clean" even if you use a hand microphone. Add to that a complete unknown speaker, possible accent, no clear sentence structure since its a interview etc.

[ «Previous Page | 1 2 3 | Next Page» ]

Previous • Next •


Username	Password