Realignment should be run without pruning

Comments

User: tpavelka
Date: 4/23/2009 7:34 am

Views: 5491
Rating: 14

It may happen (usually during a big mismatch between transcript and the actual speech in the recording) that no tokens reach the end of the utterence due to pruning. In that case the sentence is not included in aligned.mlf, but HTK does not report any errors. When HERest is run again, it throws an error and ends because it cannot find the transcription of the sentence in aligned.mlf.

This can be solved by switching off pruning: leave out the -t switch.

Re: Realignment should be run without pruning

User: kmaclean
Date: 5/3/2009 8:06 pm

Views: 92
Rating: 11

Hi tpavelka,

>When HERest is run again, it throws an error and ends because it

>cannot find the transcription of the sentence in aligned.mlf.

>This can be solved by switching off pruning: leave out the -t switch.

I think I am missing something here...

Does the incorrectly recognized audio still get included in the acoustic model, and if so, won't this reduce the accuracy of the acoustic model during speech recognition?

Therefore, is it not better to have the training process bomb, look at why the segment is not realigning properly, and remove (or correct) the problem audio segment and run the training process all over again?

thanks,

Ken

Re: Realignment should be run without pruning

User: tpavelka
Date: 5/5/2009 4:34 am

Views: 108
Rating: 13

Hi Ken,

I understand what you mean, here is a more detailed analysis:

The cause of the problem:

When doing forced Viterbi alignment HTK may not always generate transcriptions due to overpruning. Without tracing enabled HTK does not report any errors and those transcriptions are not included in the MLF. If you run HERest training it crashes because it cannot find transcriptions for some of the sound files.

Possible solutions:

Let HERest crash, find the problematic file and try to fix it or remove it. Run HERest again and repeat. I do not recomend this since I have spent an entire day doing just that. One iteration of HERest can take quite a long time and you can have 10-20 of such files (or more, depending on the number of errors in the corpus)
Run HVite without pruning and hope that the bad files do not seriously affect your final system performance. I agree that this is not an optimal solution, so a better one would be:
Write a script that either checks the trace log for warnings (I have not actually tried this so I am not 100% sure if they are there), or check the transcriptions against the sound files and deal with those for which the transcriptions are missing.

Re: Realignment should be run without pruning

User: kmaclean
Date: 5/5/2009 10:11 am

Views: 87
Rating: 12

Hi tpavelka,

I think I may have found a fourth solution, though I am not sure how theoretically sound it is... it is a modified version of your 3rd solution.

In processing a new submission I pre-screen the audio as follows: I run a script that creates create a monophone acoustic model with just the audio from a new submission. I then run the HVite realignment against the same submission audio to see if everything matches or not. This is a very rough estimate, and is not perfect, but it seems to catch most of the major discrepancies in transcription to audio. The biggest advantage is that you don't need a full Acoustic Model training run to determine if a particular submission is problematic or not.

I started this approach when I did not have enough audio in the corpus to make a good enough speaker independent acoustic model. It probably makes more sense now to just use the VoxForge acoustic model to check transcriptions of new submissions.

Regardless, if someone does not have access to an good speaker independent acoustic model, this approach might work on a large corpus if you can divide it up by speaker (or a subset of the audio) and create small monophone acoustic models for each speaker, and see how well the transcriptions match the audio, without having to wait to for HERest to fail...

Ken

Re: Realignment should be run without pruning

User: tpavelka
Date: 5/5/2009 10:54 am

Views: 123
Rating: 13

Hi Ken,

I don't know if I understand you correctly. Why is there a need for retraining? You already have a trained acoustic model, which can be used for forced Viterbi alignment, because in the standard scenario, as described in e.g. HTKBook

you start with a phoneme MLF that is created using a pronunciation dictionary and a word MLF
with this you run several iterations of HERest, so you have a trained SI acoustic model
only after that you have this model you run the forced Viterbi alignment which can cause the problem with overpruning.

Then, all you need to do to avoid the train-crash-repeat problem is to check the aligned MLF for missing parts.

Of course, you can do forced Viterbi on new submissions and use the fully trained model and see if it does not return any result due to overpruning, or you can watch the resulting score, but I am afraid you will only catch the worst errors, like a totaly silent recordinmg etc.

Oh, and sorry for not yet responding to the thread about tutorials, I will get to that eventually.

Re: Realignment should be run without pruning

User: kmaclean
Date: 5/13/2009 12:20 pm

Views: 88
Rating: 12

Hi Tpvelka,

>Why is there a need for retraining? You already have a trained

>acoustic model, which can be used for forced Viterbi alignment,

I started this approach when I did not have enough audio in the corpus to make a good enough speaker independent acoustic model. Just never have got around to using the SI VoxForge AM...

>all you need to do to avoid the train-crash-repeat problem is to check

>the aligned MLF for missing parts.

Seems simpler, thanks.

>Oh, and sorry for not yet responding to the thread about tutorials, I

>will get to that eventually.

No worries, we've all got many other priorities... :)

Ken

Previous • Next •


Username	Password