VoxForge
--- (Edited on 3/ 2/2007 4:05 pm [GMT-0500] by kmaclean) ---
More from David Gelbart:
Regarding forced alignment, I think you will find this paper of
interest:
Evaluating Factors Impacting the Accuracy of Forced Alignments...
Lei Chen, Yang Liu, Mary Harper, Eduardo Maia, Susan McRoy
http://citeseer.ist.psu.edu/724385.html
They compare forced alignment performance of HTK and ISIP. They also
mention at least one downloadable system that can be used for
alignment, the HTK-based Aligner. I wonder if the trained models for
their WSJ-HTK or SWB-ISIP system can be downloaded too?
They also considered the impact of doing segmentation before
alignment, and found it helpful.
It looks like Lei Chen's is still at Purdue:
http://cobweb.ecn.purdue.edu/~chenl
Yang Lui is now in Dallas:
http://www.hlt.utdallas.edu/~yangl/
I didn't check up on the other authors. I know Yang from ICSI; she's
very nice.
Feel free to quote from this mail (or my previous mail) on the
VoxForge forums if you like.
--- (Edited on 3/ 2/2007 4:06 pm [GMT-0500] by kmaclean) ---
More from David Gelbart:
> They also considered the impact of doing segmentation before
> alignment, and found it helpful.
--- (Edited on 3/ 2/2007 4:06 pm [GMT-0500] by kmaclean) ---
--- (Edited on 3/ 2/2007 4:07 pm [GMT-0500] by kmaclean) ---
> My sense is that running any forced alignment on a large file would take way too long.
There are many papers describing training on data that wasn't transcribed for recognition. In order to do so, some form of forced alignment must have taken place in order to split the data up into chunks suitable for forward/backward training. I'd suggest that it would be worth testing your assumption, particulary as LibriVox data is very clean, and so should segment/align quite easily.
Tony
P.S. Is there some way to "subscribe" or otherwise be notified of all recent posts to every forum on this site?
--- (Edited on 3/ 4/2007 1:59 pm [GMT-0600] by Tony Robinson) ---
Hi Tony,
Excellent! Thanks for the info re: force alignment.
With respect to subscribing to a forum, after you click on the title of a forum on the Forums page, a list of threads appears. At the very top of this page forum there should be a subscribe link.
Notes:
Ken
--- (Edited on 3/ 4/2007 4:01 pm [GMT-0500] by kmaclean) ---
--- (Edited on 3/ 6/2007 10:22 am [GMT-0500] by kmaclean) ---
Hi David,
Thanks again for your feedback on this.
The main conclusion from the paper you cite is this:
From this study, we found that segmenting the speech ?les prior to alignment improves the overall alignment accuracy and that alignment accuracy is enhanced by using more advanced acoustic models and more training data matched on speaking style (conversational versus planned) to the data to be aligned. Speaker adaptation improves the models somewhat, but more so for the weaker models.
I am really only looking to segment the audio and text data into 5-10 second snippets - I don't need word-level or phone-level time alignments.
I have done some experiments with HTK. As you indicated, the run time for forced alignment is not an issue (less than 1 minute for a 60 meg wav file). Unfortunately, HVite Forced Alignment using the full text does not yield acceptable results. It may be that I need to enter silence phones at the end of each sentence and paragraph - I will try that next.
Next I will try to segment the data by paragraph (manually or using Julius' adintool utility) as described above, and try to use Forced Alignment to determine where the sentence boundaries are. If that doesn't work, I'll try speaker adaptation with part of the Librivox audio book submission - since I am limited to the VoxForge Acoustic Model for HTK and Julius experimentation. Julian does not seem to handle large wav files (greater than 10 seconds).
If none of these work, then I may need to look at using ISIP alignment tools or Sphinx-align, since they use more robust Acoustic Models. Or simply perform manual sentence segmentation until the VoxForge AM gets robust enough to be useful for Forced Alignment Segmentation.
Ken
--- (Edited on 3/12/2007 12:41 am [GMT-0400] by kmaclean) ---
Hi -
first of all please excuse my ignorance and the slight off-topicness of this posting.
I'm a researcher in neuro/computational linguistics (but not the speech variety, unfortunately), and am looking for some way to time-annotate plain text relative to a speech file. I do EEG research, and will be recording the electrical activity on people's scalps (aka "brainwaves") while they listen to a recorded spoken text. Afterwards the EEG file has to be aligned to the word level of the written text, so we can trace brain responses to individual words. So I would need an output something along the lines of:
[text] [audio-file]
The 0-232ms
lazy 233-507ms
brown 598-789ms
fox 790-1012ms
...
I figured the best way to do this would be to force-align an audio-book with it's text. The book I would like to use is Beatrix Potter's Peter Rabbit: http://www.gutenberg.org/etext/12702.
I saw that you've been discussing how long segments can be, and what level of training/tweaking is required to get acceptable results. Is it at all feasible to try and force-align longer stretches of audio-book text, of say 10 minutes? (I guess the recordings are pretty clean compared to say phone transcripts and the like). How much do things like speech accent and plain text formating matter?
Any suggestions welcome!
Brian Murphy
CIMeC, University of Trento, Italy
--- (Edited on 3/27/2007 4:07 am [GMT-0500] by mofei) ---
Hi Brian,
There should be no problem performing a forced alignment of ten minutes of audio. You do need the audio and the text to correspond (ie. plain text with no words missing or extra), a pronunciation for each word and pre-existing acoustic models. Accent isn't too much of an issue.
Contact me off-board if you like (tonyr at cantabResearch.com) and I should think we can work something out to get the timings you need.
Tony
--
Dr Tony Robinson, CEO Cantab Research Ltd
Phone: +44 845 009 7530, Fax: +44 845 009 7532
--- (Edited on 28-March-2007 12:42 pm [GMT+0100] by Tony Robinson) ---