VoxForge
I'm not sure if this is a dumb question:
Could subtitled videos (with .srt files, for example) be used as a huge existing pool of transcribed speech? The extraneous symbols ([swish],[whisper], etc.) would have to be removed, but it seems like simple code.
If copyright laws would prevent the above, could a public domain source of transcribed audio be used? The English Parliament (from England) has an archive back to 2009 of their parliamentary debates in both houses, both televised and transcribed, with time markers every 2 minutes or so. (I looked at a small sample, but there does seem to be many hours recorded). I realize that this would not be ideal for American users to train the language recognition software on English voices, but I'm sure their is a similarly progressive state legislature somewhere. Has this already been considered?
--- (Edited on 1/6/2011 8:30 pm [GMT-0600] by Visitor) ---
Hi SamA,
>I'm not sure if this is a dumb question:
not at all
>If copyright laws would prevent the above,
Copyright is the limiting factor for many of these sources.
>The English Parliament (from England) has an archive back to 2009 of their
>parliamentary debates in both houses,
Thanks for pointing out this additional source of recorded/transcribed speech.
I've been collecting a list of possible sources of speech audio here. I will add it there.
Ken
--- (Edited on 1/6/2011 11:08 pm [GMT-0500] by kmaclean) ---