VoxForge
Hello,
I am working on speaker independent isolated word recognition. I implemented a recognition system using hidden markov models on MATLAB. However, my supervisor asks me to do it using HTK as well, since it is the state of the art. I have been trying to learn using HTK Book and other tutorials from google but was not that successful. I have a deadline, and picking up the information I need from a 300 pages of book is not trivial at all:) Thus, I would appreciate any help from this forum. Some of my questions are as below:
1) How do we write the configuration files?
I have written one for HCopy with .txt extension. I got the 'error 5010: cannot open source file'. I checked the HTKdemo provided, and there configuration files were with extension .dcf. I tried it too but nothing changed.
2) When I look at the configuration parameters for HCopy, I see that source and target formats are 'HTK'. What is an HTK format and can't we use .wav files as input? Also, what is a source kind?
3) I know that by using HTK we can have features for speech recognition like MFCC. However I would like to obtain my features in a different way. Then I will use discerete HMM, so quantize my data with HQuant. Does HQuant accept my features?
4) I am not familiar with Perl, is it really required for HTK?
Kind regards
Seliz
--- (Edited on 7/22/2009 9:48 am [GMT-0500] by selizk) ---
> I have a deadline, and picking up the information I need from a 300 pages of book is not trivial at all:)
We are also short of money.
Btw, HTK Book is nicely organized and you shouldn't read through it, just some important chapters like tutorial that describes everything on 40 pages. You could read it already while waiting for this answer.
> I have written one for HCopy with .txt extension. I got the 'error 5010: cannot open source file'. I checked the HTKdemo provided, and there configuration files were with extension .dcf. I tried it too but nothing changed.
Extension doesn't matter. Most probably you made a mistake in some other place and that mistake was the error of 5010.
> I see that source and target formats are 'HTK'. What is an HTK format and can't we use .wav files as input? Also, what is a source kind?
Target format is the format of the result. The source format could be MSWAV. The SOURCEKIND is described in HTK book on page 71.
> Does HQuant accept my features?
You can pass any feature file you like there. You just need to convert it to correct format pointed by config.
> I'm not familiar with Perl, is it really required for HTK?
No
--- (Edited on 7/22/2009 6:23 pm [GMT-0500] by nsh) ---
Thank you for the answer.
For the config file, I have written a text document containing:
SOURCEFORMAT = WAV
SOURCEKIND = WAVEFORM
TARGETKIND = WAVEFORM
Then from the command window I go to the location where this config file exists.Then I run:
HCopy -C config 1A.wav tgt.wav
1A.wav also exists in that location. I still have the same error, error 5010: cannot open source file. Does anyone know what the reason might be?
PS: I still think the HTK Book is complicated although organized, it seems it is aimed for professionals.
--- (Edited on 7/23/2009 8:56 am [GMT-0500] by selizk) ---
>SOURCEFORMAT = WAV
>SOURCEKIND = WAVEFORM
>TARGETKIND = WAVEFORM
>HCopy -C config 1A.wav tgt.wav
That should work - you might try:
Did you have any errors compiling HTK? What are you running on - Windows vs Linux?
Ken
--- (Edited on 7/23/2009 11:44 am [GMT-0400] by kmaclean) ---