Audio and Prompts Discussions

Flat
How to segment audio data manually ? Any guidlines?
User: Binh
Date: 9/30/2013 4:46 am
Views: 4048
Rating: 7

hi there,

I need some pointers how to segment my audio data to minimize the following error.

ERROR: "main_align.c", line 765: Final state not reached; no alignment for wagner_mann03_dw_wagner_de_008

The reason I am asking is because I tried to segment following german text.

http://www.messe2media.com/files/rundumwagner.mp3

In this picture you can see how I cut it. As you can see the speaker makes a pause beetwen every sentence.

http://www.messe2media.com/files/AudacityCutting.jpg

I cut right in the middle of these. This lead to a lot of alignment errors while using Sphinxtrain. Almost 80% of my new data got rejected.

Segmented Data for Speaker 1:

http://www.messe2media.com/files/wagner_mann01_dw.tgz

Forced Alignment helps a lot at this point but since I am cutting the audio manually I wonder if I can minimize these problems by following some kind of "cutting guidlines".

So are any general points I have to consider while segmenting audio for training? Like "length should be beetween 5-10 seconds"(from your wiki)


Be aware that I am intentionally NOT sharing the training folder right now because it is really big(whole german voxfoge corpus). 

And because I am asking for more general pointers or "best practice" for segmenting audio for speech recognition training.


Binh

P.S. I posted the same request in Sphinx Help.

 

 

--- (Edited on 9/30/2013 4:46 am [GMT-0500] by Binh) ---

--- (Edited on 9/30/2013 4:47 am [GMT-0500] by Binh) ---

PreviousNext