I know, how to get data for training

Acoustic Model Discussions

Flat

User: jflsbtovs64mduyg4ngt
Date: 11/13/2014 4:00 pm

Views: 7259
Rating: 3

1 get a lot of clear audio with well-recognized speech 2 split to words 3 recognize speech with third-party engine 4 Use the data for training Examples: 1) YouTube is a huge pile of transcribed audio. They have a lot of videos with ASR sublitles and even manual-made subtitles. 2) A lot of movies with subtitles 3) You can use third-party engines to recognize audio from 1) and 2), for example Google's, Nuance's, Yandex's (free for 10k requests per day) and other's.

--- (Edited on 11/13/2014 4:00 pm [GMT-0600] by jflsbtovs64mduyg4ngt) ---

Re: I know, how to get data for training

User: kmaclean
Date: 11/13/2014 9:27 pm

Views: 1220
Rating: 3

Thanks for the suggestion, but usually these sources of speech have Copyright issues.

--- (Edited on 11/13/2014 10:27 pm [GMT-0500] by kmaclean) ---

Re: I know, how to get data for training

User: Andrew767
Date: 3/17/2018 9:59 pm

Views: 2264
Rating: 0

I suggest or draw your attention to public government recordings such as Senate videos, which in most countries are free from copyright issues, many have little ambient noise, and there are dozens of speakers for whom there are hours of recordings - e.g. when filibusting.
Also it has been eight years since the discussion about compressed & uncompressed recordings; during the 1990s high compression rates were still essential, but cheaper greater bandwidth and storage capacities have faciliated better quality recordings during the past decade.

--- (Edited on 3/17/2018 9:59 pm [GMT-0500] by ) ---

Previous • Next •


Username	Password