VoxForge
It may happen (usually during a big mismatch between transcript and the actual speech in the recording) that no tokens reach the end of the utterence due to pruning. In that case the sentence is not included in aligned.mlf, but HTK does not report any errors. When HERest is run again, it throws an error and ends because it cannot find the transcription of the sentence in aligned.mlf.
This can be solved by switching off pruning: leave out the -t switch.
Hi tpavelka,
>When HERest is run again, it throws an error and ends because it
>cannot find the transcription of the sentence in aligned.mlf.
>This can be solved by switching off pruning: leave out the -t switch.
I think I am missing something here...
Does the incorrectly recognized audio still get included in the acoustic model, and if so, won't this reduce the accuracy of the acoustic model during speech recognition?
Therefore, is it not better to have the training process bomb, look at why the segment is not realigning properly, and remove (or correct) the problem audio segment and run the training process all over again?
thanks,
Ken
Hi Ken,
I understand what you mean, here is a more detailed analysis:
The cause of the problem:
When doing forced Viterbi alignment HTK may not always generate transcriptions due to overpruning. Without tracing enabled HTK does not report any errors and those transcriptions are not included in the MLF. If you run HERest training it crashes because it cannot find transcriptions for some of the sound files.
Possible solutions:
Hi tpavelka,
I think I may have found a fourth solution, though I am not sure how theoretically sound it is... it is a modified version of your 3rd solution.
In processing a new submission I pre-screen the audio as follows: I run a script that creates create a monophone acoustic model with just the audio from a new submission. I then run the HVite realignment against the same submission audio to see if everything matches or not. This is a very rough estimate, and is not perfect, but it seems to catch most of the major discrepancies in transcription to audio. The biggest advantage is that you don't need a full Acoustic Model training run to determine if a particular submission is problematic or not.
I started this approach when I did not have enough audio in the corpus to make a good enough speaker independent acoustic model. It probably makes more sense now to just use the VoxForge acoustic model to check transcriptions of new submissions.
Regardless, if someone does not have access to an good speaker independent acoustic model, this approach might work on a large corpus if you can divide it up by speaker (or a subset of the audio) and create small monophone acoustic models for each speaker, and see how well the transcriptions match the audio, without having to wait to for HERest to fail...
Ken
Hi Ken,
I don't know if I understand you correctly. Why is there a need for retraining? You already have a trained acoustic model, which can be used for forced Viterbi alignment, because in the standard scenario, as described in e.g. HTKBook
Then, all you need to do to avoid the train-crash-repeat problem is to check the aligned MLF for missing parts.
Of course, you can do forced Viterbi on new submissions and use the fully trained model and see if it does not return any result due to overpruning, or you can watch the resulting score, but I am afraid you will only catch the worst errors, like a totaly silent recordinmg etc.
Oh, and sorry for not yet responding to the thread about tutorials, I will get to that eventually.
Hi Tpvelka,
>Why is there a need for retraining? You already have a trained
>acoustic model, which can be used for forced Viterbi alignment,
I started this approach when I did not have enough audio in the corpus to make a good enough speaker independent acoustic model. Just never have got around to using the SI VoxForge AM...
>all you need to do to avoid the train-crash-repeat problem is to check
>the aligned MLF for missing parts.
Seems simpler, thanks.
>Oh, and sorry for not yet responding to the thread about tutorials, I
>will get to that eventually.
No worries, we've all got many other priorities... :)
Ken