Click here to register.

Edit Message

Visitor Name
Subject
Message

Re: More on Collecting Speech Audio for Free GPL Speech Corpus

I was reading the various msg on the topic and decidedto add my 2 cents to the debate.

 As per the Wave theory (physics) waves are superposable and thus the result of the superposition is a complex wave. The Fourier Transformation is based on this principle and eventually generates a spectrum of all sunusoidal waves / amplitudes appearing in the coumpounded wave (spectrum).

Following the theory the principle works both ways : addition AND substraction of waves. Thus the problem is not to record noisy speech but how to effectively substract the unwanted noise / background info etc.

The ideal trick to this job is to have 2 sound sources, one close to the mouth - to capture voice - and another one away from it - to capture the background sounds.

Then substract. This might not be a reality yet but I am convinced that it will come ... we are actually working on it...

Thus in my opinion the Corpus should only contain CLEAN stuff.

Which by the way can lead to another interesting approach: the corpus data can be "polluted" with background noises (as pre-processing step by applying the same Wave theory) and then matched against the incoming waves... which effectively is what is being suggested in one of the previous posts!

Essentially we need to treat 2 channels independently : the voice channel AND the background channel... I'm convinced that this is the holy grail.   

serial_strat

--- (Edited on 4/ 3/2007 10:16 am [GMT-0500] by Visitor) ---