Email sent to Udhyakumar Nallasamy:
I am the admin for the VoxForge website (www.voxforge.org). VoxForge is collecting transcribed speech audio from users to be used in the creation of GPL Acoustic Models for Free and Open Source Speech Recognition Engines such as Julius, Sphinx, ISIP and HTK.
David Gelbart mentioned that you have done some experimentation on the effects of MP3 coding on speech recognition. I was wondering if, in your opinion, increasing the amount of training data using MP3 speech audio (or other lossy audio formats like ogg) might improve speech recognition performance. Or would we better off just sticking with uncompressed (or lossless compressed) audio in the creation of our Acoustic Models?
--- (Edited on 2/13/2007 3:14 pm [GMT-0500] by kmaclean) ---
Nice to hear from you. In my experiments (for TIMIT) I didnt find much
degradation of speech recognition accuracy with MP3 compression,
provided the training and test data are both MP3 compressed. However,
in a standards perspective it is better to stick to wav/shorten files,
as many speech toolkits dont yet handle MP3 directly.
Hope this helps,
--- (Edited on 2/13/2007 3:17 pm [GMT-0500] by kmaclean) ---