VoxForge
Hi,ken
when using perl script "prompts2mlf" to convert TIMIT prompts to TIMIT trainword.mlf,there are some symbols I wonder How to deal with,such as symbols ","in the middle of the sentence,and symbol "-"between words,also there are symbols "?"in the prompts.As we know ,we will not construct ","model or "-"model or "?"model,but they do exist in my trainword.mlf,how can I deal with them?
question two:
In TIMIT speech audio,one prompt sentence often "read" by different people ,so ,in my trainword.mlf ,should I repeat these prompts ?(Have the same context,but with different file path).
Wainting for you answer:)
> I wonder How to deal with,such as symbols ","in the middle of the
>sentence,a
Use Perl regex...
Keith Vertanen's HTK Wall Street Journal Training Recipe might help too.
Ken
Thank you!
another question need your help:)
when I using HDman to make my own dict,in the output file :dlog,there are some words do not exit in beep dictionary,so I got the warning of "Missing words";How can I solve this problem? Should I change the dictionary,such as Using TIMIT dict to replace beep dict(they have different phone set)?OR,should I add the missing words to the beep dict?Here is my dlog file:
Missing Words
-------------
ACCELEROMETERS
ARCHEOLOGICAL
COOKEDOVER
CORY
FINEFEATHERED
GUS
HALFINCH
INTERVIEWEE
JUNGLELIKE
MAIDS'
NOTHIN
ODOR
PLAYIN'
PLOW
ROCKANDROLL
SEMIHEIGHTS
SHAWN
UNFENCED
UPBEAT
Y'ALL
Dictionary Usage Statistics
---------------------------
Dictionary TotalWords WordsUsed TotalProns PronsUsed
beep 237399 1188 256679 1352
dict1 1188 1188 1352 1352
1208 words required, 20 missing
New Phone Usage Counts
---------------------
1. ah : 103
2. sp : 1352
3. ax : 470
4. ey : 123
5. b : 138
6. l : 363
7. iy : 189
8. aw : 42
9. t : 461
10. ae : 133
11. k : 308
12. d : 288
13. eh : 196
14. m : 215
15. ih : 453
16. ao : 95
17. ng : 70
18. n : 420
19. s : 378
20. ch : 48
21. v : 107
22. sh : 86
23. z : 215
24. jh : 75
25. er : 58
26. aa : 60
27. f : 132
28. r : 378
29. g : 95
30. hh : 58
31. ea : 24
32. p : 208
33. ow : 100
34. oh : 104
35. dh : 36
36. w : 99
37. th : 27
38. oy : 24
39. y : 68
40. uw : 81
41. ay : 123
42. zh : 7
43. ua : 19
44. ia : 47
45. uh : 36
Dictionary ./dict/dict1 created
>Should I change the dictionary,such as Using TIMIT dict to replace
>beep dict(they have different phone set)?OR,should I add the missing
>words to the beep dict?
That is up to you
In this case, using the TIMIT dictionary with the TIMIT corpus seems to make sense... why were you using BEEP in the first place? is it an application requirement, or were you looking for a pronunciation dictionary with a larger number of words?
Ken
Thank you ken!
The TIMIT dict have 6249 words ,but beep dict have 256711 words ,also there are different pronunciation for the same word in beep dict in case and there are only one pronunciation per word in TIMIT dict.This is the reason I choose the beep dict in the first time.
Now comes another problem,if I use TIMIT dict for training,should I need to realign the Training Data in Step 8?Since there are only one pronunciation for each word in TIMIT dict.
Sorry to disturb you again!
spring
>if I use TIMIT dict for training,should I need to realign the Training Data
>in Step 8?
You likely have to restart with Step 6 - Creating Flat Start Monophones since you are using a completely different set of phonemes.
Ken