General Discussion

Flat
Running Julius on 64-bit Ubuntu 10.04
User: Cerin
Date: 9/29/2011 9:20 pm
Views: 15576
Rating: 15

Is there any trick to getting the Linux quickstart to work on a 64-bit laptop running Ubuntu 10.04? I confirmed my laptop's mic works by recording speech with Audacity, but when I the README's suggesting "./julian -input mic -C julian.jconf", it seems to initialize, and I get the prompt:

 

### read waveform input
<<< please speak >>>

but no matter what I say, nothing seems to happen.


How are you supposed to use this? Is it print out text to the console as you speak, or does it save text to a file somewhere?


I have a few simple Wav files that I've been trying to process with Julian, using:

./julian -input rawfile -filelist filelist.txt -C julian.jconf

Julian seems to run through some startup procedure, but doesn't seem to output any converted text, and then exits with the message:

------------- System Info end -------------

------
### read waveform input
adin_file: channel num != 1 (2)
Error: failed to read /tmp/audio/hello-2.wav as a wav file
*** glibc detected *** ./julian: corrupted double-linked list: 0x0954cff0 ***

Why couldn't it read my wav file?

--- (Edited on 9/29/2011 9:20 pm [GMT-0500] by Cerin) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: Cerin
Date: 9/29/2011 9:45 pm
Views: 290
Rating: 16

I've also tried running the "controlapp" example included in the the Ubuntu package. I ran through the README's instructions, and it seems to run without error, but all it does is show the text "Taking control of Rhythmbox media player" and nothing happens when I speak into the mic.

 

Running julius without piping it into command.py shows the following. Am I doing anything wrong?

 

STAT: include config: julian.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found: "sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: init_phmm: defined HMMs:  8002
Stat: init_phmm: loading ascii hmmlist
Stat: init_phmm: logical names:  9406 in HMMList
Stat: init_phmm: base phones:    44 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 1085 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
STAT: reading [mediaplayer.dfa] and [mediaplayer.dict]...
Stat: init_voca: read 9 words
STAT: done
STAT: Gram #0 mediaplayer registered
STAT: Gram #0 mediaplayer: new grammar loaded, now mash it up for recognition
STAT: Gram #0 mediaplayer: extracting category-pair constraint for the 1st pass
STAT: Gram #0 mediaplayer: installed
STAT: grammar update completed
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 126 nodes
STAT: coordination check passed
STAT: multi-gram: beam width set to 126 (guess) by lexicon change
STAT: wchmm (re)build completed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done

STAT: ###### initialize input device
Stat: adin_oss: device name = /dev/dsp
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
----------------------- System Information begin ---------------------
JuliusLib rev.4.1.2 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    :
 -  Compiled by  : cc -g -O2 -g -Wall -O2

------------------------------------------------------------
Configuration of Modules

 Number of defined modules: AM=1, LM=1, SR=1

 Acoustic Model (with input parameter spec.):
 - AM00 "_default"
    hmmfilename=/usr/share/julius-voxforge/acoustic/hmmdefs
    hmmmapfilename=/usr/share/julius-voxforge/acoustic/tiedlist

 Language Model:
 - LM00 "_default"
    grammar #1:
        dfa  = mediaplayer.dfa
        dict = mediaplayer.dict

 Recognizer:
 - SR00 "_default" (AM00, LM00)

------------------------------------------------------------
Speech Analysis Module(s)

[MFCC01]  for [AM00 _default]

 Acoustic analysis condition:
           parameter = MFCC_0_D_N_Z (25 dim. from 12 cepstrum + c0, abs energy supressed with CMN)
    sample frequency = 16000 Hz
       sample period =  625  (1 = 100ns)
         window size =  400 samples (25.0 ms)
         frame shift =  160 samples (10.0 ms)
        pre-emphasis = 0.97
        # filterbank = 24
       cepst. lifter = 22
          raw energy = False
    energy normalize = False
        delta window = 2 frames (20.0 ms) around
         hi freq cut = OFF
         lo freq cut = OFF
     zero mean frame = OFF
           use power = OFF
                 CVN = OFF
                VTLN = OFF
    spectral subtraction = off
  cepstral normalization = real-time MAP-CMN
     base setup from = Julius defaults

 MAP-CMN:
      initial cep. data   = none
      beginning data weight = 100.00
    beginning data update = yes, from last inputs at each input

------------------------------------------------------------
Acoustic Model(s)

[AM00 "_default"]

 HMM Info:
    8002 models, 5950 states, 5950 mpdfs, 5950 Gaussians are defined
          model type = context dependency handling ON
      training parameter = MFCC_N_D_Z_0
       vector length = 25
    number of stream = 1
         stream info = [0-24]
    cov. matrix type = DIAGC
       duration type = NULLD
    max mixture size = 1 Gaussians
     max length of model = 5 states
     logical base phones = 44
       model skip trans. = exist, require multi-path handling
      skippable models = sp (1 model(s))

 AM Parameters:
        Gaussian pruning = safe  (-gprune)
  top N mixtures to calc = 2 / 0  (-tmix)
    short pause HMM name = "sp" specified, "sp" applied (physical)  (-sp)
  cross-word CD on pass1 = handle by approx. (use max. prob. of same LC)
   sp transition penalty = -70.0

------------------------------------------------------------
Language Model(s)

[LM00 "_default"] type=grammar

 DFA grammar info:
      5 nodes, 4 arcs, 4 terminal(category) symbols
      category-pair matrix: 20 bytes (544 bytes allocated)

 Vocabulary Info:
        vocabulary size  = 9 words, 33 models
        average word len = 3.7 models, 11.0 states
       maximum state num = 24 nodes per word
       transparent words = not exist
       words under class = not exist

 Parameters:
   found sp category IDs =

------------------------------------------------------------
Recognizer(s)

[SR00 "_default"]  AM00 "_default"  +  LM00 "_default"

 Lexicon tree:
     total node num =    126
      root node num =      9
      leaf node num =      9

    (-penalty1) IW penalty1 = +5.0
    (-penalty2) IW penalty2 = +20.0
    (-cmalpha)CM alpha coef = 0.050000

     inter-word short pause = on (append "sp" for each word tail)
      sp transition penalty = -70.0
 Search parameters:
        multi-path handling = yes, multi-path mode enabled
    (-b) trellis beam width = 126 (-1 or not specified - guessed)
    (-n)search candidate num= 1
    (-s)  search stack size = 500
    (-m)    search overflow = after 2000 hypothesis poped
            2nd pass method = searching sentence, generating N-best
    (-b2)  pass2 beam width = 200
    (-lookuprange)lookup range= 5  (tm-5 <= t <tm+5)
    (-sb)2nd scan beamthres = 200.0 (in logscore)
    (-n)        search till = 1 candidates found
    (-output)    and output = 1 candidates out of above
     IWCD handling:
       1st pass: approximation (use max. prob. of same LC)
       2nd pass: loose (apply when hypo. is popped and scanned)
     all possible words will be expanded in 2nd pass
     build_wchmm2() used
     lcdset limited by word-pair constraint
    short pause segmentation = off
    fall back on search fail = off, returns search failure

------------------------------------------------------------
Decoding algorithm:

    1st pass input processing = real time, on-the-fly
    1st pass method = 1-best approx. generating indexed trellis
    output word confidence measure based on search-time scores

------------------------------------------------------------
FrontEnd:

 Input stream:
                 input type = waveform
               input source = microphone
        device API          = default
              sampling freq. = 16000 Hz
             threaded A/D-in = supported, on
       zero frames stripping = on
             silence cutting = on
                 level thres = 2000 / 32767
             zerocross thres = 60 / sec.
                 head margin = 300 msec.
                 tail margin = 400 msec.
        long-term DC removal = off
          reject short input = off

----------------------- System Information end -----------------------

    *************************************************************
    * NOTICE: The first input may not be recognized, since      *
    *         no initial CMN parameter is available on startup. *
    * for MFCC01*
    *************************************************************

STAT: AD-in thread created
<<< please speak >>>^C

--- (Edited on 9/29/2011 9:45 pm [GMT-0500] by Cerin) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: colbec
Date: 9/30/2011 12:13 pm
Views: 284
Rating: 13

While your sound device may be working, Julius evidently cannot hear because it is not listening on the right sound system or the right device.

I can't speak for Ubuntu but - in the above output note the

STAT: ###### initialize input device
Stat: adin_oss: device name = /dev/dsp

Your Julius is trying to use OSS. This might be right, it might not. Check the packages you have installed for ALSA, which is perhaps a bit more prevalent. If you are using ALSA Julius should be compiled with alsa libraries present. If these are found Julius would output something like 'adin_alsa: device name ...'. At that point you can use the ALSADEV environment variable according to Julius info.

The second issue is whether /dev/dsp is right or not. What physical sound hardware are you using? Are you using a sound server like pulseadio?

--- (Edited on 9/30/2011 12:13 pm [GMT-0500] by colbec) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: Cerin
Date: 9/30/2011 1:40 pm
Views: 133
Rating: 15

How would I determine what audio system I'm using? It's a fairly vanilla Ubuntu 10.04 install, so I'm using whatever the default is. Looking at my packages, I have pulseaudio installed, but I also have gnome-alsamixer, alsa-base and alsa-oss.


The Hardware tab in Gnome's Sound Preferences dialog says I'm using "Digital Stereo (IE958) Output + Analog Stereo Input".

Also, aplay gives me:

~$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: NVidia [HDA NVidia], device 0: Cirrus Analog [Cirrus Analog]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 1: Cirrus Digital [Cirrus Digital]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

And yes, I believe I'm using pulseaudio, because that's the default setup in Ubuntu.

Regards,

Chri

--- (Edited on 9/30/2011 1:40 pm [GMT-0500] by Cerin) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: colbec
Date: 9/30/2011 3:04 pm
Views: 155
Rating: 15

OK we're making good progress here.

Take the device first. It looks like you only have the one device, card 0 with 2 subdevices, let's take it that is your speakers and the mike, which is most probably a regular wired headset plugged into your main card on the motherboard. This perhaps eliminates the possibility of a /dev/dsp1 or 2 or 3 as might be the case with a USB or bluetooth device. /dev/dsp is likely not the issue.

This allows us to focus on the sound library. The ideal solution is to use an installation of Julius that is aware of your ALSA. Your Julius version is 4.1.2, I think that there is a later precompiled version 4.2 which has alsa as a default and handles pulseadio a lot better.

http://julius.sourceforge.jp/en_index.php

One thing to try before you go this route is to prefix your call to julius with padsp which is a pulseaudio utility.

$ padsp julius ...

If that fails, take a look at Julius 4.2, if you compile your own you will be able to check that julius can see your alsa libraries.

--- (Edited on 9/30/2011 3:04 pm [GMT-0500] by colbec) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: Cerin
Date: 9/30/2011 9:32 pm
Views: 244
Rating: 13

Thanks, using padsp managed to get the example controlapp working for me.


The remaining problem I have with is transcribing files. I think the main issue is that Julius is unable to process arbitary wav files, and requires that a wav file have a specific sampling rate. For example, I ran:

julius -input file -filelist filelist.txt -C julian.jconf

and got the error:

Error: adin_file: sampling rate != 16000 (32000)
Error: adin_file: error in parsing wav header at hello.wav
Error: adin_file: failed to read speech data: "hello.wav"

So I used sox to resample my hello.wav to 16000.

sox hello.wav -r 16000 hello.16.wav resample

Then, I updated my filelist.txt to only contain "hello.16.wav".

However, now when I run julius, I get:

Error: adin_file: bytes per second != 32000 (16000)
Error: adin_file: error in parsing wav header at hello.16.wav
Error: adin_file: failed to read speech data: "hello.16.wav"

I'm really confused by this. Aren't "sample rate" and "bytes per second" the same thing? Googling doesn't really clarify, as some sites use them interchangeably, while others use slightly different terms (e.g. bits per second, or bit depth).

How can I set sample rate to 32000 but also set bytes per second to 16000?


Regards,

Chri

--- (Edited on 9/30/2011 9:32 pm [GMT-0500] by Cerin) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: colbec
Date: 10/1/2011 7:19 am
Views: 165
Rating: 12

The julian.jconf file sets up a number of parameters which are important for files, have you checked these for compatibility with your sound files?

--- (Edited on 10/1/2011 7:19 am [GMT-0500] by colbec) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: Cerin
Date: 10/1/2011 9:18 am
Views: 162
Rating: 12

Yes, I checked my jconf file, but there's no mention of a "bytes-per-second" parameter. I see the smpFreq parameter, and it's 16000, and I've resampled to 16000. I don't know where it's getting 32000.

These are the only uncommented options.

 

-dfa mediaplayer.dfa
-v mediaplayer.dict
-h /usr/share/julius-voxforge/acoustic/hmmdefs
-hlist /usr/share/julius-voxforge/acoustic/tiedlist
-penalty1 5.0        # first pass
-penalty2 20.0        # second pass
-iwcd1 max    # assign maximum likelihood of the same context
-gprune safe        # safe pruning, accurate but slow
-b2 200                 # beam width on 2nd pass (#words)
-sb 200.0        # score beam envelope threshold
-spmodel "sp"        # HMM model name
-iwsp            # append a skippable sp model at all word ends
-iwsppenalty -70.0    # transition penalty for the appenede sp models
-smpFreq 16000        # sampling rate (Hz)

--- (Edited on 10/1/2011 9:18 am [GMT-0500] by Cerin) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: Cerin
Date: 10/1/2011 9:40 am
Views: 126
Rating: 14

I think I found the sox call to do the bit rate and format conversion.

sox hello.wav -r 16000 -b 32 -c 1 hello.s32

Julius reports <search failed>, but I'm guessing that's just because it can't recognize this specific pronounciation of "hello".

--- (Edited on 10/1/2011 9:40 am [GMT-0500] by Cerin) ---

Re: Running Julius on 64-bit Ubuntu 10.04
User: Cerin
Date: 10/1/2011 9:55 am
Views: 143
Rating: 12

I just ran Julius, with Voxforge's model, on 30 different audio files, of people saying short simple phrases. I expected Julius to miss a few, but it failed to transcribe *any* of them. That's a bit disappointing.

--- (Edited on 10/1/2011 9:55 am [GMT-0500] by Cerin) ---

PreviousNext