VoxForge
Hello all,
I am working on a speech assessment system which targets people with speech disorders, currently using pocketsphinx, and I need to recognize specific mispronunciation sounds (such as glottal stops, pharyngeal fricatives and hypernasal consonants) in addition to regular English sounds.
To do this, I would like to train new phones and add them to the default English acoustic model. I want to take advantage of the model in order not to start the training procedure from scratch. For the training, I would use recordings which contain both regular English phones and mispronunciation phones, but I want to learn only the new phones from them. The existing acoustic model could help to generate a better segmentation of the recordings, so that the new phones are trained on the appropriate speech segments.
How can I do this using Sphinx? I guess I have to make some tweaks to sphinxtrain, but I don't understand the training procedure well enough to get started. Any clue, thought or opinion on the matter is welcome!
Thank you in advance for the help,
Cedric
--- (Edited on 5/21/2013 11:26 pm [GMT-0500] by ) ---
Please find the answer on CMUSphinx forum
https://sourceforge.net/p/cmusphinx/discussion/help/thread/d63376e7/
--- (Edited on 5/24/2013 04:21 [GMT+0400] by nsh) ---
Below is a copy of my answer on the CMUSphinx forum:
Hello Nickolay, thank you for your answer.
An acoustic model contains context-dependent detectors for phones, not just phones.
Sure, I understand that, but still, all the context-dependent phones that do not contain the new phones in their surrounding context have already been learned and do not have to be reestimated, right? Also, I would think that the available models would help to segment the data and improve the training for the new models.
In any case, my problem is that I have very little data available and it would not be sufficient to train the entire set of English phones in addition to the new phones. Even though I could add speech files from VoxForge to my dataset, if possible, I wanted to make use of the acoustic model provided with pocketsphinx, as it has given me better accuracy results so far than the model from VoxForge.
--- (Edited on 5/23/2013 9:24 pm [GMT-0500] by Cedric) ---