VoxForge
Email sent to Udhyakumar Nallasamy:
Hi Udhyakumar,
I am the admin for the VoxForge website (www.voxforge.org).
VoxForge is collecting transcribed speech audio from users to be used
in the creation of GPL Acoustic Models for Free and Open
Source Speech Recognition Engines such as Julius, Sphinx, ISIP and
HTK.
David Gelbart mentioned that you have done some experimentation on
the
effects of MP3 coding on speech recognition. I was wondering if,
in your opinion, increasing the amount of training data using MP3
speech
audio (or other lossy audio formats like ogg) might improve
speech recognition performance. Or would we better off just
sticking with uncompressed (or lossless compressed) audio in the
creation of our Acoustic Models?
thanks,
Ken
--- (Edited on 2/13/2007 3:14 pm [GMT-0500] by kmaclean) ---
His reply:
Hi Ken,
Nice to hear from you. In my experiments (for TIMIT) I didnt find much
degradation of speech recognition accuracy with MP3 compression,
provided the training and test data are both MP3 compressed. However,
in a standards perspective it is better to stick to wav/shorten files,
as many speech toolkits dont yet handle MP3 directly.
Hope this helps,
Udhay
--- (Edited on 2/13/2007 3:17 pm [GMT-0500] by kmaclean) ---