Comments

Flat
Sampling rate and Nyquist frequency
User: kmaclean
Date: 2/27/2008 2:04 pm
Views: 8359
Rating: 43

With respect to using higher sampling rates for speech, the following excerpt from SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, By  Daniel Jurafsky and  James H. Martin, second edition draft chapters (I don't think the draft chapters are on-line anymore, however, the book is well worth the price if you are interested in Speech Recognition) is very helpful:

Recall that the ?rst step in processing speech is to convert the analog representations (?rst air pressure, and then analog electric signals in a microphone), into a digital signal. This process of analog-to-digital conversion has two steps: sampling and quantization. A signal is sampled by measuring its amplitude at a particular time; the sampling rate is the number of samples taken per second. In order to accurately measure a wave, it is necessary to have at least two samples in each cycle: one measuring the positive part of the wave and one measuring the negative part.

More than two samples per cycle increases the amplitude accuracy, but less than two samples will cause the frequency of the wave to be completely missed. Thus the maximum frequency wave that can be measured is one whose frequency is half the sample rate (since every cycle needs two samples). This maximum frequency for a given sampling rate is called the Nyquist frequency.

Most information in human speech is in frequencies below 10,000 Hz; thus a 20,000 Hz sampling rate would be necessary for complete accuracy. But telephone speech is ?ltered by the switching network, and only frequencies less than 4,000 Hz are transmitted by telephones. Thus an 8,000 Hz sampling rate is suf?cient for telephone-bandwidth speech like the Switchboard corpus.  A 16,000 Hz sampling rate (sometimes called wideband) is often used for microphone WIDEBAND speech.

Even an 8,000 Hz sampling rate requires 8000 amplitude measurements for each second of speech, and so it is important to store the amplitude measurement ef?ciently. They are usually stored as integers, either 8-bit (values from -128–127) or 16 bit (values from -32768–32767). This process of representing real-valued numbers as integers is called quantization because there is a minimum granularity (the quantum size) and all values which are closer together than this quantum size are represented identically.

Previous