VoxForge
From the MLCommons website:
The People’s Speech Dataset is the world’s largest labeled open speech dataset and includes 87,000+ hours of transcribed speech in 59 different languages with a diverse set of speakers. This open dataset is large enough to train speech-to-text systems and crucially will be available with a permissive license. Just as ImageNet catalyzed machine learning for vision, the People’s Speech will unleash innovation in speech research and products that are available to users across the globe.
from Techcrunch article:
The People’s Speech Dataset was assembled from a variety of sources, with about 65,000 of its hours coming from audiobooks in English, with the text aligned with the audio. Then there are 15,000 hours or so sourced from around the web, with different acoustics, speakers and styles of speech (for example conversational instead of narrative). In addition, 1,500 hours of English audio were sourced from Wikipedia, and then 5,000 hours of synthetic speech of text generated by GPT-2 were mixed in (“A little bit of the snake eating its own tail,” joked Kanter). Fifty-nine languages in total are represented in some way, though as you can tell it is mostly English.
Thank you for discussing the brief info about People’s Speech Dataset.