General Discussion

Flat
Log Phenome?
User: Tom J.
Date: 11/18/2020 11:17 am
Views: 6163
Rating: 0

I'm currently working with Julius. An initial thought I had was to log the phenomes as heard by Julius so I could spot problems with my dialect vs the model.

It just dawned on me that my AI really doesn't need to have human readable responses trained. If there were a way to use Julius or another speech recognition tool for that matter to log the phenomes then the API I've written could return that as a string.

My AI could have a limitless vocabulary and without the step of passing through the dictionary the returned string should be very fast.

Now I'm wondering if simply writing a .voca file with just the individual phoenetics as matches rather than words as matches to phoenetic groups  would achieve this.

Anybody ever explore this particular rabbit hole?

--- (Edited on 11/18/2020 11:17 am [GMT-0600] by Tom J.) ---

Re: Log Phenome?
User: kmaclean
Date: 11/18/2020 11:36 am
Views: 41
Rating: 1

take a look at wav2letter - which started out as a way of predicting letters directly from the raw waveform

 

--- (Edited on 11/18/2020 12:36 pm [GMT-0500] by kmaclean) ---

Re: Log Phenome?
User: Tom J.
Date: 11/20/2020 10:57 am
Views: 91
Rating: 0

Thank you kmaclean,

I made a direct phenome to phenome dictionary and it works but Julius never appears to hear the same thing twice without triphones to match and there are no distinct spaces between words since every phenome is considered a word.

In all I'm glad I conducted the experiment.

I'm going to take the wav2letter suggestion next and play with it but I can't help but wonder if Julius doesn't have options in the jconf to set the spacing of words. If I could do an additional space between words it would be a matter of iterating the string in c++ and eliminating single spaces only.

Then perhaps write a dictionary that's more or less matching chunks of words.

--- (Edited on 11/20/2020 10:57 am [GMT-0600] by Tom J.) ---

--- (Edited on 11/20/2020 10:59 am [GMT-0600] by Tom J.) ---

Re: Log Phenome?
User: kmaclean
Date: 12/1/2020 9:51 am
Views: 3
Rating: 0

> If I could do an additional space between words it would be a matter of iterating the string in c++ and eliminating single spaces only.

>Then perhaps write a dictionary that's more or less matching chunks of words.

I'm still not clear on what you are trying to do... but Grammar files can be designed to return as long a string as you want.

--- (Edited on 12/1/2020 10:51 am [GMT-0500] by kmaclean) ---

Re: Log Phenome?
User: Tom J.
Date: 12/2/2020 10:51 am
Views: 203
Rating: 0

>I'm still not clear on what you are trying to do...

It's a little robot with an AI and I have several more planned. I've got Julius running on a SBC and now I'm working through the challenges of fine tuning - getting a thorough dictionary that's as lean as I can make it, building an API, and making a language model with HTK. The tiny computer is unable to run HTK as it's an arm processor so I'm building the LM on a big computer.

Right now I'm experimenting with different ways to return strings with Julius and sorting through the big voxforge phenome file to locate words for my dictionary.

I really wish there was an h file for C++ to pause/start Julius and return strings but I haven't found one so I'm writing one, and it's very crude as I'm relatively new to writing code.

I've found Julius server and tested it so maybe there's something in there I can use, it's obvious I'm not breaking new ground here because the sample voca is literally for using Julius in this way "phone steve" and such...

I did have a couple short verbal conversions with it already but I've still got a long road to travel.

--- (Edited on 12/2/2020 10:51 am [GMT-0600] by Tom J.) ---

Re: Log Phenome?
User: Tom J.
Date: 1/3/2021 2:22 pm
Views: 241
Rating: 0

Almost done assembling my .voca file and I had a quick question about order.

If Julius is looking for a match and finds it will it move on or keep looking at the rest of the words?

I'm asking because I got a list of the 100 most frequently spoken words in English and moved those words to the top of the .voca file, if it's beneficial I can get the top 1000 but it's a good deal of work to do if it's to no avail.

 

--- (Edited on 1/3/2021 2:22 pm [GMT-0600] by Tom J.) ---

Re: Log Phenome?
User: Tom J.
Date: 4/23/2021 3:52 am
Views: 31
Rating: 0

Now I'm pondering this approach as a training aid. If my little robot asks me to spell a new word to store as a string,  then kills Julius. Starts it back up with a different jconf using a straight phenome vocabulary. Then prompts me to say the word a couple times it could theoretically write new lines to the worded voca, recompile the vocabulary,  and restart Julius with the new word added the way I said it.

Another experiment on the horizon.

--- (Edited on 4/23/2021 3:52 am [GMT-0500] by Tom J.) ---

Re: Log Phenome?
User: Tom J.
Date: 4/24/2021 2:00 pm
Views: 38
Rating: 0

Assembling my phenome dictionary to experiment with today.

Armed with a dictionary from one of the Julius versions containing 265353 word entries I've found 39 phenomes myself:

aa    
ae
ah 
ao  
aw   
ay       
ch    
dh  
eh  
er   
ey   
hh  
ih   
iy  
jh    
ng  
ow     
oy   
sh     
th     
uh   
uw    
zh     
b     
d   
f    
g    
k     
l     
m     
n      
p     
r       
s        
t      
v      
w      
y      
z      

Looking through an old post here on voxforge by user juge:

http://www.voxforge.org/home/forums/message-boards/speech-recognition-engines/create-phoneme-recogniser-using-voxforge-models-and-htk

I see he has identified an additional 3 phenomes: ax, dx, ix but they do not appear in what I believe to be a thorough word list.

The post was from 2009, so maybe those 3 were from and old version, could anybody tell me if those are relevant?

I'm using the jconf and stuff from the Quick 2018 version, engine output from julius -- version on my computer:

tom@tom-Latitude-E6420:~$ julius --version
JuliusLib rev.4.2.2 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    :
 -  Compiled by  : gcc -g -O2 -fdebug-prefix-map=/build/julius-BbVgaR/julius-4.2.2=. -fstack-protector-strong -Wformat -Werror=format-security

Library configuration: version 4.2.2
 - Audio input
    primary A/D-in driver   : alsa (Advanced Linux Sound Architecture)
    available drivers       : alsa oss pulseaudio
    wavefile formats        : RAW and WAV only
    max. length of an input : 320000 samples, 150 words
 - Language Model
    class N-gram support    : yes
    word id unit            : short (2 bytes)
 - Acoustic Model
    multi-path treatment    : autodetect
 - External library
    file decompression by   : zlib library
 - Process hangling
    fork on adinnet input   : no

 

Thanks in advance, any input would be appreciated, Tom

 

 

--- (Edited on 4/24/2021 2:00 pm [GMT-0500] by Tom J.) ---

Re: Log Phenome?
User: kmaclean
Date: 4/26/2021 8:04 am
Views: 89
Rating: 0

VoxForge phoneset

--- (Edited on 4/26/2021 9:04 am [GMT-0400] by kmaclean) ---

Re: Log Phenome?
User: Tom J.
Date: 5/2/2021 11:12 am
Views: 22
Rating: 0

Thank you so much for showing me the phoneset, I never would have found it as I had the terminology wrong when I searched. Now I can see I did have it correct.

I made an alphabet.voca file, it seems to work well with the exception of "z", and that could be a microphone issue on my end.

I still need to tweak the grammar file, then I'll write a little function for the robot and test speech training. This likely will not work, but I'm optimistic. It would be pretty neat to train new words verbally on the hardware that would be interpreting them.

Here's the main part of my alphabet.voca if it can help someone else:

% WORD
a   ey
b   b iy
c   s iy
d   d iy
e   iy
f   eh f
g   jh iy
h   ey ch
i   ay
j   jh ey
k   k ey
l   eh l
m   eh m
n   eh n
o   ow
p   p iy
q   k y uw
r   aa r
s   eh s
t   t iy
u   y uw
v   v iy
w   d ah b ah l y uw
x   eh k s
y   hh w ay
y   w ay
z   z iy

alphabet.grammar:

S: NS_B WORD_LOOP NS_E
WORD_LOOP: WORD

--- (Edited on 5/2/2021 11:12 am [GMT-0500] by Tom J.) ---

 

--- (Edited on 5/2/2021 11:23 am [GMT-0500] by Tom J.) ---

PreviousNext