User:
Visitor
Date: 8/9/2010 1:42 am
Views: 590
Rating: 14
версия ПО PocketSphinx 0.6.1
параметры запуска и лог представлены ниже
ожидаемый результат - вывод в консоль расспознанного русского текста. Кодировка словаря транслирована из utf-8 в koi-8r по выше описанным причинам.
В результате получаю несколько ошибок и никакого русского текста в консоле. При использовании англиских акустических моделей с данного сайта и сайта http://cmusphinx.sourceforge.net/ получаю положительный результат.
# ./ru_wav_decode.sh
INFO: cmd_ln.c(512): Parsing command line:
pocketsphinx_batch \
-hmm model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000 \
-lm etc/msu_ru_nsh.lm.dmp \
-dict etc/msu_ru_nsh_koi8.dic \
-fdict etc/msu_ru_nsh.filler \
-cepdir . \
-cepext .raw \
-adcin yes \
-backtrace yes \
-ctl ru_wav.ctl
Current configuration:
[NAME] [DEFLT] [VALUE]
-adchdr 0 0
-adcin no yes
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-backtrace no yes
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-build_outdirs yes yes
-cepdir .
-cepext .mfc .raw
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-ctl ru_wav.ctl
-ctlcount -1 -1
-ctlincr 1 1
-ctloffset 0 0
-ctm
-debug 0
-dict etc/msu_ru_nsh_koi8.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict etc/msu_ru_nsh.filler
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgctl
-fsgdir
-fsgext
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000
-hyp
-hypseg
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm etc/msu_ru_nsh.lm.dmp
-lmctl
-lmname default default
-lmnamectl
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mllrctl
-mllrdir
-mllrext
-mmap yes yes
-nbest 0 0
-nbestdir
-nbestext .hyp .hyp
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-outlatdir
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(512): Parsing command line:
\
-alpha 0.97 \
-dither yes \
-doublebw no \
-nfilt 31 \
-ncep 13 \
-lowerf 130.0 \
-upperf 3700.0 \
-nfft 256 \
-wlen 0.0256 \
-samprate 8000 \
-transform legacy \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-varnorm no
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no yes
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.300000e+02
-ncep 13 13
-nfft 512 256
-nfilt 40 31
-remove_dc no no
-round_filters yes yes
-samprate 16000 8.000000e+03
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 3.700000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/feat.params
INFO: fe_interface.c(288): You are using the internal mechanism to generate the seed.
INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: mdef.c(520): Reading model definition: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/mdef
INFO: bin_mdef.c(173): Allocating 113108 * 8 bytes (883 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/means
INFO: ms_gauden.c(292): 1153 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/variances
INFO: ms_gauden.c(292): 1153 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1418 variance values floored
INFO: acmod.c(119): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/means
INFO: ms_gauden.c(292): 1153 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/variances
INFO: ms_gauden.c(292): 1153 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1418 variance values floored
ERROR: "ptm_mgau.c", line 801: Number of codebooks exceeds 256: 1153
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/means
INFO: ms_gauden.c(292): 1153 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/variances
INFO: ms_gauden.c(292): 1153 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1418 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights: model_parameters/msu_ru_nsh.cd_cont_1000_8gau_8000/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 1153 senones: 1 features x 8 codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(123): The value of topn: 4
INFO: dict.c(294): Allocating 190163 * 20 bytes (3714 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: etc/msu_ru_nsh_koi8.dic
INFO: dict.c(206): Allocated 1606 KiB for strings, 3214 KiB for phones
INFO: dict.c(309): 186063 words read
INFO: dict.c(314): Reading filler dictionary: etc/msu_ru_nsh.filler
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 4 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 51^3 * 2 bytes (259 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 31416 bytes (30 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 31416 bytes (30 KiB) for single-phone word triphones
ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file
INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(195): ngrams 1=30086, 2=849050, 3=1790728
INFO: ngram_model_dmp.c(241): 30086 = LM.unigrams(+trailer) read
WARNING: "ngram_model_dmp.c", line 252: -mmap specified, but tseg_base is not word-aligned. Will not memory-map.
INFO: ngram_model_dmp.c(289): 849050 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314): 1790728 = LM.trigrams read
INFO: ngram_model_dmp.c(338): 11663 = LM.prob2 entries read
INFO: ngram_model_dmp.c(357): 7054 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(377): 8028 = LM.prob3 entries read
INFO: ngram_model_dmp.c(405): 1659 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(461): 30086 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 859 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 18 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 18 single-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 65976
INFO: ngram_search_fwdtree.c(333): after: 676 root, 65848 non-root channels, 16 single-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 8.45 -0.04 -0.34 0.50 -0.17 0.08 -0.15 -0.02 -0.10 -0.08 -0.14 -0.22 0.01
INFO: ngram_search_fwdtree.c(933): cand_sf[] increased to 64 entries
INFO: ngram_search.c(407): Resized backpointer table to 10000 entries
INFO: ngram_search.c(407): Resized backpointer table to 20000 entries
INFO: ngram_search.c(415): Resized score stack to 200000 entries
INFO: ngram_search.c(407): Resized backpointer table to 40000 entries
INFO: ngram_search.c(415): Resized score stack to 400000 entries
INFO: ngram_search.c(407): Resized backpointer table to 80000 entries
INFO: ngram_search.c(415): Resized score stack to 800000 entries
INFO: ngram_search_fwdtree.c(1513): 56752 words recognized (56/fr)
INFO: ngram_search_fwdtree.c(1515): 899965 senones evaluated (881/fr)
INFO: ngram_search_fwdtree.c(1517): 11931636 channels searched (11674/fr), 668027 1st, 582656 last
INFO: ngram_search_fwdtree.c(1521): 91827 words for which last channels evaluated (89/fr)
INFO: ngram_search_fwdtree.c(1524): 730786 candidate words for entering last phone (715/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 1497 words
INFO: ngram_search_fwdflat.c(912): 39038 words recognized (38/fr)
INFO: ngram_search_fwdflat.c(914): 535096 senones evaluated (524/fr)
INFO: ngram_search_fwdflat.c(916): 1933711 channels searched (1892/fr)
INFO: ngram_search_fwdflat.c(918): 273575 words searched (267/fr)
INFO: ngram_search_fwdflat.c(920): 102748 word transitions (100/fr)
ERROR: "ngram_search.c", line 1034: Couldn't find <s> in first frame
INFO: pocketsphinx.c(805): wav/mm: (null) (158990392)
INFO: word start end pprob ascr lscr lback
ERROR: "ngram_search.c", line 1034: Couldn't find <s> in first frame
ERROR: "ngram_search.c", line 1034: Couldn't find <s> in first frame
INFO: batch.c(661): wav/mm: 10.21 seconds speech, 10.80 seconds CPU, 11.17 seconds wall
INFO: batch.c(663): wav/mm: 1.06 xRT (CPU), 1.09 xRT (elapsed)
INFO: batch.c(675): TOTAL 10.21 seconds speech, 10.80 seconds CPU, 11.17 seconds wall
INFO: batch.c(677): AVERAGE 1.06 xRT (CPU), 1.09 xRT (elapsed)