Acoustic Model Discussions

Nested
Adapting Acoustic Model w/ Language Model
User: Visitor
Date: 6/1/2012 12:43 am
Views: 6859
Rating: 8

I am going to try to adapt the VoxForge Acoustic Model and create my own English Language Model, but I have some questions.

 

Step 1: Language Model

Obtain pre-compiled corpora, mainly from NLTK

  • Gutenberg Corpus
  • Web and Chat Text
  • Brown Corpus
  • Reuters Corpus

The NLTK book also references additional corpora that I'm going to try and hunt down, and the HTK book ships with 50 volumes of Sherlock Holmes. Any recommendations for useful pre-compiled corpora?

I also want to attempt compiling my own corpora from unstructured text.

  1. Tokenize sentences: using the pre-trained (for English) Plunkt sentence tokenizer
  2. convert all letters to lower-case: text.lower()
  3. convert numbers into words: custom function
  4. remove accent marks and diacritics: unicodedata module
  5. expand abbreviations: Are there any precompiled databases of abbreviations and their expansions?
  6. spelling canonicalization: Once again, any precompiled databases of alternative spellings?
  7. contractions: How should contractions be expanded into words: "can't" -> ["can not"] or ["can","t"] or ["ca","nt"]

Lastly, is accuracy ultimately important? Can a relatively small error (say 0.005) have a noticeable impact on speech recognition?

My most important concern in creating a Language Model is dealing with words not contained in the vocabulary file. Should I delete any sentences containing unknown words? Is an open vocabulary language model (containing unknown words) acceptable to julius, or does a closed vocabulary language model (no unknown words) perform better?

For the record, I am following the walkthrough in chapter 15 of the HTK Book.

 

Step 2: expand the VoxForge Speaker Independent Acoustic Model Dictionary, using NLTK to retrieve sentences containing the most common words not included in the VoxForge dictionary file.

Several problems, however:

  • The walk-through linked from GRAMMAR_NOTES of the quickstart does not exist: http://www.voxforge.org/home/acousticmodels. How do I go about adding additional words to an existing acoustic model?
  • What is the difference between "lexicon/voxforge/VoxForgeDict" and "HTK_AcousticModel/dict" and which one should I use?
  • Is recording entire sentences (non-voxforge prompts) allowed?
  • What phonetic alphabet is used when adding new words?

Step 3: adapt the VoxForge Speaker Independent Acoustic Model to my voice

  1. Which chapter of the HTK Book explains the HHEd edit script commands?
  2. Any pointers for using HTS (a patched version of HTK) 3.4.1?

--- (Edited on 6/1/2012 12:43 am [GMT-0500] by Visitor) ---

Re: Adapting Acoustic Model w/ Language Model
User: nsh
Date: 6/3/2012 7:15 am
Views: 2774
Rating: 7

> Any recommendations for useful pre-compiled corpora?

Wikipedia dump

> Lastly, is accuracy ultimately important?

No

> Can a relatively small error (say 0.005) have a noticeable impact on speech recognition?

No

> Is an open vocabulary language model (containing unknown words) acceptable to julius

From the decoder point of view it doesn't matter, it just affects the size of the language model. Decoder only considers words from a dictionary even if language model has some other words.

> The walk-through linked from GRAMMAR_NOTES of the quickstart does not exist: http://www.voxforge.org/home/acousticmodels. How do I go about adding additional words to an existing acoustic model?

This link from the latest quickstart works perfectly

http://julius.sourceforge.jp/en_index.php?q=en_grammar.html

>What is the difference between "lexicon/voxforge/VoxForgeDict" and "HTK_AcousticModel/dict" and which one should I use?

Dict includes only the words from the training prompts. VoxForgeDict is larger and includes all the words. You need to select which one you need to use according to your usage patter. Usually VoxFrogeDict is better just because it's bigger.

>Is recording entire sentences (non-voxforge prompts) allowed?

What do you mean by "allowed" here? There is no law around, it's a software.

> What phonetic alphabet is used when adding new words?

The one which is used in the dictionary already. It doesn't have any specific name

> Which chapter of the HTK Book explains the HHEd edit scriptcommands?

17.8 HHEd

> Any pointers for using HTS (a patched version of HTK) 3.4.1?

http://hts.sp.nitech.ac.jp/

 

--- (Edited on 6/3/2012 16:15 [GMT+0400] by nsh) ---

PreviousNext