French

Nested
HTK and true silence
User: Cavaco
Date: 6/17/2014 10:39 am
Views: 8075
Rating: 11

Hello,

I am using HTK to do phone alignment in English.

I found out that padding audio files with true silence (with the pad option of Sox) at beginning and end of an audio file completely looses HTK although I use a word network with 'sil'. 

This happens when using USEPOWER=T (Use power not magnitude in fbank analysis)

When setting USEPOWER to false, then the alignment is OK.

Does anyone have an explanation on this? (on the difference between energy and magnitude with tru silence) ?

Thanks in advance

Re: HTK and true silence
User: nsh
Date: 6/17/2014 2:30 pm
Views: 3833
Rating: 12

> I found out that padding audio files with true silence (with the pad option of Sox) at beginning and end of an audio file completely looses HTK although I use a word network with 'sil'. 

This is a bad idea. Feature extraction algorithm uses logarithm and zero energy frames cause numerical overflow. To deal with that recognizer introduces threshold to put in log, still that threshold is usually pretty small and large negative values affect further steps (CMN). It is better to avoid zero energy regions.

Does anyone have an explanation on this? (on the difference between energy and magnitude with tru silence) ?
Zero frame handling might be different for energy feature computation, for example threshold might be different. Still, in both cases you should have a degradation. It's better to avoid padding at all.
PreviousNext