VoxForge
Google also had a useful set of data correlating speech samples with written words, culled from its free directory service, Goog411. People call the service and say the name of a city and state, and then say the name of a business or category. According to Mike Cohen, a Google research scientist, voice samples from this service were the main source of acoustic data for training the system.
Source:
http://www.technologyreview.com/communications/21696/?nlid=1533&a=f
--- (Edited on 11/25/2008 1:24 pm [GMT-0600] by nsh) ---