Frequently Asked Questions

What is a speech corpus or speech corpora?
User: kmaclean
Date: 1/1/2010 11:52 am
A Speech Corpus (or Spoken Corpus) is a database of speech audio files and text transcriptions of these audio files in a format that can be used to create Acoustical Models (which can then be used with a Speech Recognition Engine).  ISIP's Switchboard database is a good example of this.

A corpus is one such database.  Corpora is the plural of corpus (i.e. it is many such databases).

There are two types of Speech Copora:

(1) Read Speech - which includes

  • Book excerpts;
  • Broadcast news;
  • Lists of words;
  • Sequences of numbers.

(2) Spontaneous Speech - which includes:

  • Dialogs - between two or more people (includes meetings);
  • Narratives - a person telling a story;
  • Map-tasks -  one person explains a route on a map to another;
  • Appointment-tasks - two people try to find a common meeting time based on individual schedules.


Re: What is a speech corpus or speech corpora?
User: atriokke
Date: 9/28/2012 8:03 pm
Hyperlink for Switchboard throwing a 404.