VoxForge
hi there,
I need some pointers how to segment my audio data to minimize the following error.
ERROR: "main_align.c", line 765: Final state not reached; no alignment for wagner_mann03_dw_wagner_de_008
The reason I am asking is because I tried to segment following german text.
http://www.messe2media.com/files/rundumwagner.mp3
In this picture you can see how I cut it. As you can see the speaker makes a pause beetwen every sentence.
http://www.messe2media.com/files/AudacityCutting.jpg
I cut right in the middle of these. This lead to a lot of alignment errors while using Sphinxtrain. Almost 80% of my new data got rejected.
Segmented Data for Speaker 1:
http://www.messe2media.com/files/wagner_mann01_dw.tgz
Forced Alignment helps a lot at this point but since I am cutting the audio manually I wonder if I can minimize these problems by following some kind of "cutting guidlines".
So are any general points I have to consider while segmenting audio for training? Like "length should be beetween 5-10 seconds"(from your wiki)
Be aware that I am intentionally NOT sharing the training folder right now because it is really big(whole german voxfoge corpus).
And because I am asking for more general pointers or "best practice" for segmenting audio for speech recognition training.
Binh
P.S. I posted the same request in Sphinx Help.
--- (Edited on 9/30/2013 4:46 am [GMT-0500] by Binh) ---
--- (Edited on 9/30/2013 4:47 am [GMT-0500] by Binh) ---