VoxForge
Hi,
congratulations for this valuable project! I'd like to start experimenting with HTK and have a few questions.
1) Where do I find english documentation for Julius? On the Japanese sourceforge site first of all I see a lot of '?????'. Is it an alternative to HTK, or do I need to install it?
2) Will there be any compiled Acoustic Models (trained HMMs)? This is what I thought to find below downloads->acoustic models, instead there are word-phoneme dictionaries. Probably I am misunderstanding something?
3) If I submit speech, do I just have to provide a text transcription, or also a phonetic transcription? What list of phonemes do you use? The format of the text transcriptions seems to differ between the corpora, which rules should I follow?
4) Is there a standard, how the MFCC coefficients are calculated? There are a lot of options concerning frequency bands, triangular/hamming filter functions in mel/linear space etc.
Arno
--- (Edited on 12/15/2006 1:51 pm [GMT-0600] by Visitor) ---
Hi Arno,
>1) Where do I find english documentation for Julius? On the Japanese sourceforge site first of all I see a lot of '?????'. Is it an alternative to HTK, or do I need to install it?
The english documentation is there - the Japanese Website is under maintenance (I just used Google's translation service to get this message). It is also included on this website under Docs.
You can use HTK's HVite as well as Julius as your Speech Recognition Engine. Julius uses Acoustic Models created with the HTK toolkit. Julius uses a modified BSD type license, whereas HTK does not permit distribution of the software (you can still distribute any AMs you create with HTK).
>2) Will there be any compiled Acoustic Models (trained HMMs)? This is what I thought to find below downloads->acoustic models, instead there are word-phoneme dictionaries. Probably I am misunderstanding something?
Yes - look again ... under http://www.repository.voxforge1.org/downloads/builds/0.1.1-build726/ and download a Julius/HTK AM; You can also try the QuickStart Download which has everything you need to get started with Julius.
>3) If I submit speech, do I just have to provide a text transcription, or also a phonetic transcription? What list of phonemes do you use? The format of the text transcriptions seems to differ between the corpora, which rules should I follow?
The VoxForge Submit Speech link walks you through everything you need to know. Basically you read off some phonetically balanced prompts (which we provide), and record wav files using Audacity. No phoneme list required.
If you want to use your own text you can, just make sure you submit uncompressed wav files (or lossless compressed) along with your word level transcriptions.
>4) Is there a standard, how the MFCC coefficients are calculated? There are a lot of options concerning frequency bands, triangular/hamming filter functions in mel/linear space etc.
Julius basically reduced the options in this regard because it could only accept certain types of MFCC files. The newer Julius versions of Julius no longer have this limitation - I have not had a chance to experiment with the different options yet.
Since we are collecting wav audio, rather than MFCC files, we can convert the wav files to whatever MFCC format that works the best.
Hope that helps,
Ken
--- (Edited on 12/15/2006 3:52 pm [GMT-0600] by Visitor) ---
Ken,
thanks for your reply. Most of my initial problems are solved now. In an attempt to better understand the structure of the voxforge documentation, I wonder if I could have found the Quickstart download link anywhere exept in your reply? If yes, I might find answers to later questions in the same location...
When I have some more very limited time, I'll try to figure out what MFCC format is used by Julius, or maybe someone can point me to the right place to look?
Thanks
Arno
--- (Edited on 12/17/2006 3:28 pm [GMT-0600] by Visitor) ---
The VoxForge Home page (see topmost menu above), contains the link to the QuickStart Download page on the near top right hand corner (big green arrow pointing down).
With respect to finding answers to other questions, you might try the 'Google Search' feature on the Home page - this only searches 'www.voxforge.org' domain (note that it sometimes takes a few days for Google's web crawler to include recent changes to the website).
You can also go to the VoxForge Dev Site by clicking 'dev' on the topmost menu above, and clicking the 'VoxForge Dev -Trac' link. This sends you to the VoxForge Trac bug/issue tracking system (and to a different domain name: 'www.dev.voxforge.org'), and on it there is a search feature (near the top right hand corner) that lets you search for any issues that might be related to the topic you have questions on.
The Julius website contains info on mfcc. The 'History of Changes' section at the bottom of the page contains info on release 3.5.1 changes (VoxForge uses Julius Multipath, release 3.5.2, and uses MFCC_0_D). It states the following:
o Wider MFCC types support:
- Added extraction of acceleration coefficients (_A). Now you
can recognize waveform or microphone input with AM trained with _A.
- Support all MFCC qualifiers (_0, _E, _N, _D, _A, _N, _Z) and their
combination
- Support for any vector lenth (will be guessed from AM header)
- New option: "-accwin"
- New option "-zmeanframe": frame-wise DC offset removal, like HTK
- New options to specify detailed analysis parameters (see manual):
-preemph, -fbank, -ceplif, -rawe / -norawe,
-enormal / -noenormal, -escale, -silfloor
If you download Julius, you will get the most up-to-date documentation.
If you are looking for further details on MFFC format used by Julius, then the next best place would be to look at the HTK documentation (note: you need to register with HTK to be able to download this manual).
Ken
--- (Edited on 12/17/2006 8:13 pm [GMT-0500] by kmaclean) ---
I also have the same question about MFCC feature format.I trained the HMM models using the MFCC_O_D_A_Z feature format of WSJ0 training file. When using Julius to do batch recognition,the input files are MFCC files with MFCC_O_D. Is that right? At the the begining,I just wanted to code the wav file to MFCC_O_D_A_Z, but HTK's HCOPY command would not let to do that. How can I to code the wav file into the MFCC_O_D_A_Z format to match the HMM models?
Another question is that can I do live recognition using the HMM models which trained by the MFCC_0_D_A_Z feature files(the vector length is 39)? Or if ti is OK,does it the relations with the .Jconf options?
I have done the experimented before,but the result is too bad.
--- (Edited on 3/19/2008 1:55 am [GMT-0500] by xmuasrer) ---
Hi xmuasrer,
> How can I to code the wav file into the MFCC_O_D_A_Z format to
>match the HMM models?
See this link. Basically create MFCC files using MFCC_O_D feature format. And then convert them to the desired target format (MFCC_D_N_Z_0) in the proto file and use the HComp command convert them to the correct feature format (as set out in step 6 of the VoxForge tutorial).
>can I do live recognition using the HMM models which trained by the
>MFCC_0_D_A_Z feature files(the vector length is 39)?
Yes - see link above.
>Or if ti is OK,does it the relations with the .Jconf options?
not sure what you are asking here ... If you are asking if you need to change the jconfig file in any way as a result of using a particular feature format, then answer is: I don't think so.
Ken
--- (Edited on 3/19/2008 1:52 pm [GMT-0400] by kmaclean) ---