VoxForge
I was trying out the forced alignment using HTK as described in the "Automated Audio Segmentation Using Forced Alignment" document. Everything worked great, except that I noticed that the VoxForge dictionary has multiple pronunciations for many words using a (2) suffix on the word. When running this process, the dictionary created for doing the forced alignment uses only the first pronunciations. Is that intended, or is there a mismatch here in the lexicon format that HTK expects?
Brent
PS: Thanks for that great tutorial!
--- (Edited on 11/16/2008 2:16 am [GMT-0600] by Visitor) ---
Hi Brent,
>When running this process, the dictionary created for doing the forced
>alignment uses only the first pronunciations. Is that intended, or is there a
>mismatch here in the lexicon format that HTK expects?
HTK's "forced alignment" matches the phoneme sounds it hears to the words in the pronunciation dictionary (kind of like a 'reverse lookup'...). Therefore, it seems that your pronunciations best match the first instance of the words with multiple pronunciations.
Ken
--- (Edited on 11/16/2008 9:21 am [GMT-0500] by kmaclean) ---
You might also be interested in an Perl audio segmentation script that I started on a while ago: Audiobook.pm. The documentation is inline Perldoc.
I did not add it to the Automated Audio Segmentation Using Forced Alignment page because it still needs some refinements... but I have successfully used it to segment speech text.
Ken
--- (Edited on 11/16/2008 9:48 am [GMT-0500] by kmaclean) ---