VoxForge
Also i figured that "HHEd -A -D -T 1 -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1" command uses fullist in the tree.hed file. However, in fullist file there are too many phones that i didnt use at all. I am confused because, in triphones1 and tree.hed, there are phones that i used exactly but in fullist it is not like that. So what about tiedlist? Why HHed dont use tiedlist?
> in fullist file there are too many phones that i didnt use at all.
from Step 10 of the tutorial:
Decision tree clustering used here allows previously unseen triphones to be synthesized.
What this means is that it creates an acoustic model using all the words in your pronunciation dictionary. This works OK if you have about 3-5 recordings for all the phones in your pronunciation dictionary.
However, if you are missing one, you get errors like:
>no proto for d in hset.
It looks like you need to add a few more recordings containing words that have the d phone in them... there may be more, because the process stops at the first missing phone.
> It looks like you need to add a few more recordings containing words that have the d phone in them... there may be more, because the process stops at the first missing phone.
Hi. Thank you for answering. However, i already used 56 words and 319 phonemes totally for 18 kind of phoneme that created .voca file. my dlog file says "56 words required, 0 missing" and i used too many words that contains phoneme that included in .voca file. However, for the problem is "no proto for th-hh+eh in hSet". How can i think when i recording that it would be needed? What is the main algorithm for choosing recording words based on phonemes that they contains?
One thing to try is to get a count of all the phonemes used in the voca, then a count of all the phonemes contained in the words in prompts as defined in your lexicon. As Ken says, the counts in the prompts should show that every phone in your voca is represented in the prompts at least once and preferably 5 times or more.
Another thing to look at is your lexicon, and whether your words are properly represented by phone combinations. The example you give is very strange; of course the choice of a phoneme coding is up to you, as long as it is consistent; if you are representing the English word "the" then it is possible that the analysis cannot detect the phones in the audio that you say should be there. You could try an alternative simpler representation of the word that is causing the problem, say removing the hh.
> One thing to try is to get a count of all the phonemes used in the voca, then a count of all the phonemes contained in the words in prompts as defined in your lexicon. As Ken says, the counts in the prompts should show that every phone in your voca is represented in the prompts at least once and preferably 5 times or more.
Another thing to look at is your lexicon, and whether your words are properly represented by phone combinations. The example you give is very strange; of course the choice of a phoneme coding is up to you, as long as it is consistent; if you are representing the English word "the" then it is possible that the analysis cannot detect the phones in the audio that you say should be there. You could try an alternative simpler representation of the word that is causing the problem, say removing the hh.
======================================================
I changed my prompts.txt. I just wanted to recognize two words. "BIR" and "IKI". These words have 5 phones totally as (b, er, ih, k iy). For using more than 3-5 phonemes for each phonemes of me, here my dlog below;
30 words required, 0 missing
New Phone Usage Counts
---------------------
1. b : 5
2. iy : 15
3. ch : 9
4. er : 10
5. sp : 28
6. d : 9
7. r : 6
8. ih : 11
9. k : 17
10. ey : 3
11. n : 9
12. hh : 5
13. aa : 18
14. m : 6
15. ah : 14
16. y : 6
17. t : 4
18. sil : 2
It seems pretty enough to recognize only "b", "er", "ih", "k", "iy" phones. I trained trully up to hmm12. No problem up to there. I check my HVite_log and there is no problem, phones matching with my prompts.txt. However, when i try "HHEd -A -D -T 1 -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1" again i got an error: "no proto for f-uw+b" in hSet. Please look at my phones. There are no "f" and "uw" at all!
So is my voice has "uw" or "f" phones that julius find it or what?
Exactly in there i couldnt solve problem. Please, basically what should i do?
>again i got an error: "no proto for f-uw+b" in hSet. Please look at my
>phones. There are no "f" and "uw" at all!
I know, this is confusing... but such is HTK
in Step 9, you created triphones from the words in your training set. (triphones1 is the list of triphones in your training set)
in Step 10, you create triphones that are not in your training set, but for all the words in your pronunciation dictionary, including words that were never in your training set (fulllist is the list of triphones for all the words in your pronunciation dictionary - which has many more triphones than in triphones1).
So your error occurs because in Step 10 you are trying to create an acoustic model that contains enough speech data to recognize all the words in your pronunciation dictionary, not just your training set.
Your options:
1. stop at step 9: if all you want to do is recognize is two words (i.e. "BIR" and "IKI") then you should get reasonable results for the hmmdef file in hmm12.
2. fudge step 10: everywhere you use an HTK command that requires the step 10 fulllist file, use the step 9 triphones1 file. This might give you better recognition results than in Step 9, and will reduce the final size of your acoustic model by tieing hmm states.
3. add recordings for words containing f and uw phones, and any other monophones that might be missing. You should not need to add words containing every missing triphone... Step 10 should be able to synthesize new triphones for any missing triphones.
Hope this helps!