VoxForge
Dear Voxforge developers, I am asking about details of the MFCC format used in your acoustic models.
The global settings in hmmdefs (Julius_AcousticModels_16kHz-16bit_MFCC_O_D_build726.tgz) is as follows:
~o<STREAMINFO> 1 25<VECSIZE> 25<NULLD><MFCC_D_N_Z_0><DIAGC>
In step 5 of the data preparation tutorial you suggest to use
TARGETKIND = MFCC_0_D
How do MFCC_D_N_Z_0 and MFCC_O_D agree? And what is the exact meaning (and order) of the 25 vector components in hmmdefs? For these qualifiers:
_D Delta coefficients appended_N Absolute log energy suppressed_Z Cepstral mean subtracted_0 Cepstral C0 coefficient appended
I would expect (12+1)*2=26 components (NUMCEPS=12, + C0, doubled by delta coefficients), not 25 components.
Thanks
Martijn
--- (Edited on 7/21/2007 3:24 am [GMT-0500] by Visitor) ---
Hi Martijn,
> How do MFCC_D_N_Z_0 and MFCC_O_D agree?
I can't remember the exact details on this (it's been a while ...) but at one point Julius/Julian only accepted the MFCC_D_N_Z_0 and MFCC_E_D_N_Z feature formats (I used MFCC_D_N_Z_0 because it seemed to provide better recognition results - I need to revisit this ...). To complicate things, HTK's HCOPY command would not let me create MFCC_D_N_Z_0 format files.
As a work around, I had to create MFCC files using MFCC_O_D feature format. And then convert them to the desired target format (MFCC_D_N_Z_0) in the proto file and use the HComp command convert them to the correct feature format (as set out in step 6 of the VoxForge tutorial) .
Note: that as of release 3.5.1, Julius supports all MFCC qualifiers (_0, _E, _N, _D, _A, _N, _Z) in any combination (according to the release notes).
>And what is the exact meaning (and order) of the 25 vector components in hmmdefs?
I don't know the exact meaning off-hand, I've never had to get into the details of this. I assume with the new versions of Julius/Julian, the feature formats(_0, _E, _N, _D, _A, _N, _Z) can be in any order.
With respect to the meaning of the vector components, The Julius 3.2 book section 7. Feature parameter files (-input mfcfile) might be helpful:
Feature Parameter Types
It is necessary that the feature parameter format matches the original HMM training data feature format. However if all the necessary parameters for the HMM are held within the given feature parameter file,Julius will then automatically extract the appropriate parameters for recognition.For example
If the parameter format below is used for training
MFCC_E_D_N_Z = MFCC(12)+ ?MFCC(12)+ ?Pow(1) (CMN) 25-dimension
Then for recognition you can also use feature parameter files other then MFCC_E_D_N_Z, such as
MFCC_E_D_Z = MFCC(12)+Pow(1)+?MFCC(12)+ ?Pow(1)
(CMN) 26-dimension
or
MFCC_E_D_A_Z = MFCC(12)+Pow(1)+ ?MFCC(12)+ ?Pow(1)
+ ??MFCC(12) + ??Pow(1) (CMN) 39-dimension
The parameter file needs to contain all of the parameters used for the original training of the HMM model,
extra data contained with in the file will not be used.
where (from the HTK book):
_E has energy
_N absolute energy suppressed
_D has delta coe?cients
_A has acceleration coe?cients
_C is compressed
_Z has zero mean static coef.
_K has CRC checksum
_O has 0’th cepstral coef
The HTK manual provides more information.
You might
also try downloading and searching the HTK mailing list archives (note:
you need to be logged in to HTK to download these ...):
htk-users.mbox 09-Feb-2007 14:08 20Mhtk-developers.mbox 09-Feb-2007 10:36 2.9M
Ken
--- (Edited on 7/22/2007 10:30 pm [GMT-0400] by kmaclean) ---
See this post for more info: converting .wav to .mfc using HCopy in HTK which says:
It would seem to be that the conversion is *not* actually done in the proto (as I had written in the article referred to in nsh's post), but in the config file. From Step 6 of the VoxForge Tutorial:
You also need a configuration file. Create a file called 'config' in your 'voxforge/manual' directory and add the following data:
TARGETKIND = MFCC_0_D_N_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
Which is then used inHCompV:
$HCompV -A -D -T 1 -C config -f 0.01 -m -S train.scp -M hmm0 proto
--- (Edited on 6/9/2009 5:33 pm [GMT-0400] by kmaclean) ---
Dear kmaclea
I read through the post: converting .wav to .mfc using HCopy in HTK, but still have some question about the MFCC formate.
I am currently trying to forced aligned my own speech data with the Julius_352_models. I tried two ways:
1. As another post suggested, I use different configeration file for HCopy and HVite. HCopy using MFCC_0_D_Z and HVite using MFCC_0_D_N_Z. The HVite performed with no error. However, the aligned information does not seems close to correct after checking with the wave plot of the speech.Will using two different configeration file actually work for this task?
2. I tried to follow the instruction of step 6 which use HCompV to covert the MFCC formate. Howeer, in the HCompV, it does not generate feature file (*.mfc) for my sample. For my understanding, the HCompV is used to training but not converting data for testing.
Thank you
Chat
--- (Edited on 10/14/2009 8:55 am [GMT-0500] by Visitor) ---
Hi Chat,
>I am currently trying to forced aligned my own speech data with the
>Julius_352_models.
The VoxForge models are not perfect... you might try one of the current Nightly Builds .
If you want better forced alignment results on your own speech, then you might be better off creating an acoustic model using a subset of your speech data (i.e. do manual alignment and then train).
Further, you could use this transcribed subset to adapt the VoxForge acoustic model or just submit it VoxForge for inclusion in the VoxForge corpus and acoustic models.
>2. I tried to follow the instruction of step 6 which use HCompV to
>covert the MFCC formate. Howeer, in the HCompV, it does not
>generate feature file (*.mfc) for my sample.
You still need to convert you audio to MFCC as stated in Step 5.
It's best not to try to skip steps until you understand the whole process really well.
Ken
--- (Edited on 10/15/2009 2:22 pm [GMT-0400] by kmaclean) ---