VoxForge
My post to the Simon SourcForge Forum
Hi bedahr,
I managed to translate enough of the GUI (using Google translation, and
recompiling the source) to get a basic understanding of what Simon
does/will do.
Some questions/clarifications (note: these comments are based on my
rough translations from German to English using Google, so I may be
misinterpreting things because of this ...) :
juliusd
The Julius Daemon (juliusd) seems like it starts *Julian* in server
mode, and then opens up a console that is essentially a replacement for
jcontrol. Does juliusd essentially act as an API to Julian for Simon?
i.e. Simon has no direct contact with Julian, and only gets recognition
Julian results from juliusd?
Juliusd has a configuration setting that points to a julian.jconf. I
guess that this is where julian gets pointed to its Acoustic Model. I
had to modify the juliusd settings as follows to get things to work:
Command: julian
Arguments: -input mic -C julian.jconf
I put the julian.jconf configuration file in the juliusd/bin directory, with the following configurations:
-h acoustic_model_files/hmmdefs
-hlist acoustic_model_files/tiedlist
I then copied the most current VoxForge Acoustic Models to the
juliusd/bin/acoustic_model_files directory, and started juliusd in its
own console as follows:
$cd juliusd/bin/
$juliusd
The output in the juliusd console looked like some of the output
usually seen with Julian starts-up, so I think I got things set
properly.
How does the “Send one Test Word” work? I'm not sure I understand what it is supposed to do...
Simon
System tab
In the Simon System tab, you set your Julian grammar files (the .voca
and .grammar files), and the corresponding system commands. So it seems
like you could speak a command, and Simon would send the request to the
operating system or application. Will Simon be sending the commands to
x-windows (like x-voice), or will it use some other method? Do you have
any sample command files? - I'd like to get an idea of what format they
are supposed to be in.
You can also point to a pronunciation dictionary and a prompts file. It
seems like Simon is a GUI front that can permit someone to add new
words to a Julian Grammar file, using pronunciation information from
the pronunciation dictionary (so the user does not have to enter
phonemes by hand).
I seem to be able to connect to juliusd, but am not quite sure how
everything is supposed to work. When I click the Connect link in Simon,
I get a message in juliusd that the server was connected, but I am not
sure how recognition, and corresponding command execution is to take
place.
Word add
In the main window for Simon, there is a Word icon that seems to let
you add a new word, and record speech audio corresponding to that word.
I assume that this is for the purpose of gathering acoustic data so
that the julian acoustic model can be adapted using audio for the new
word.
I'm wondering how any new words might be trained into the Acoustic
Model. If you need to use HTK, it would add a level of complexity for
new users, because they would have to download the HTK source
themselves and compile it (since HTK has distribution restrictions on
the source and binaries).
Word List
This seems like the repository for all the words added in word add ...
i.e. a place where you can manage all the words that have been added to
the system.
It also seems like it is set up to point to an Acoustic Model trainor (like HTK ...?) - is that what it is for?
Train
Seems like a way to prompt users to record sentences – using either
their own text that they import or some predefined text. But I am not
clear how this is to be used to update the acoustic models – since
there is no sentence repository GUI front-end, as there is for Word
Adds.
Implement
This seems like where you select the programs that Simon will be sending recognized commands to.
Thanks,
Ken
--- (Edited on 4/19/2007 11:24 am [GMT-0400] by kmaclean) ---
post from bedahr :
Hi Ken!
> I managed to translate enough of the GUI (using Google
translation, and recompiling the source) to get a basic understanding
of what Simon does/will do.
Wow, that was fast. Have you replaced the german text or have you added
the translation, using qt4s translation features? Because we have to
provide a german version too - our targets don't even know English
(yet).
> juliusd
>
> The Julius Daemon (juliusd) seems like it starts *Julian* in server mode, and then opens up a console
> that is essentially a replacement for jcontrol. Does juliusd essentially act as an API to Julian for
> Simon? i.e. Simon has no direct contact with Julian, and only gets recognition Julian results from
> juliusd?
Simon provides a very basic network socket based connection to the real
recognizer. Juliusd is more/less a sample daemon that uses this socket,
parses julius/julian output and writes it to that socket.
Simon doesn't really care if the recognition is done by julius, julian or for example sphinx.
> Juliusd has a configuration setting that points to a julian.jconf. I guess that this is where julian
> gets pointed to its Acoustic Model. I had to modify the juliusd settings as follows to get things to
> work:
> Command: julian
> Arguments: -input mic -C julian.jconf
The current settings are just for testing purposes and actually uses a
bug: if the process exits immediatly, juliusd probably won't recognize
that and thinks it is still running (we keep this minor bug to simplify
the dev. process as we don't need a real recognizer that way. We just
have to supply a command that will immediatly fail - for example a
program that doesn't even exist. That is why we used "juliano" - it's
just a command that will return fast enough).
> The output in the juliusd console looked like some of the output usually seen with Julian starts-up,
> so I think I got things set properly.
Generally juliusd just starts a process and monitors it output - it's really that simple.
> How does the ?Send one Test Word? work? I'm not sure I understand what it is supposed to do...
It pretends that julius has recognized a word. Namly the word/sequence that you enter in the dialog.
(To test simon without the need of a working recoginition itself)
> Simon
>
> System tab
> In the Simon System tab, you set your Julian grammar files (the .voca and .grammar files), and the
> corresponding system commands. So it seems like you could speak a command, and Simon would send the
> request to the operating system or application. Will Simon be sending the commands to x-windows (like
> x-voice), or will it use some other method? Do you have any sample command files? - I'd like to get an
> idea of what format they are supposed to be in.
We currently have 3 types of commands:
Exec (executes commands)
Place (tells the OS to open the given place - works also with urls and stuff)
Special Keyword (like escaping - for example "simon simon" like "\\" in a text would write a "\").
The commands will be stored in an XML format (we are currently working on that).
The .voca and .grammar files in the settings dialog are just stubs for
now. In the future the client (simon) would negotiate the language
model with the server (juliusd) to simplify the training process
between different computers.
> You can also point to a pronunciation dictionary and a prompts file. It seems like Simon is a GUI
> front that can permit someone to add new words to a Julian Grammar file, using pronunciation
> information from the pronunciation dictionary (so the user does not have to enter phonemes by hand).
Correct. As a failsafe we will also provide a method to add words from scratch. But that's not yet implemented.
> I seem to be able to connect to juliusd, but am not quite sure how everything is supposed to work.
> When I click the Connect link in Simon, I get a message in juliusd that the server was connected, but
> I am not sure how recognition, and corresponding command execution is to take place.
Try to say something. If julius/julian recognizes it, it should send it to simon.
Simon should start typing or execute a command. (try "simon Texteditor" or "simon Google" for example).
If not, try the "Send one Test Word" in juliusd to send these commands. Simon should act accordingly.
> Word add
> In the main window for Simon, there is a Word icon that seems to
let you add a new word, and record speech audio corresponding to that
word. I assume that this is for the
> purpose of gathering acoustic data so that the julian acoustic model can be adapted using audio for the new word.
>
> I'm wondering how any new words might be trained into the Acoustic
Model. If you need to use HTK, it would add a level of complexity for
new users, because they would have
> to download the HTK source themselves and compile it (since HTK
has distribution restrictions on the source and binaries).
Yes. The dialog tries to add a new word to the model. (We haven't
discussed how to deal with words which are not in the lexicon - we have
to add an option to add a custom pronunciation to the word - which will
be difficult if we want to keep it simple).
HTK seems the only option for now. Writing something similar from scratch is out of reach (at least for now ^^).
> Word List
> This seems like the repository for all the words added in word add
... i.e. a place where you can manage all the words that have been
added to the system.
>
> It also seems like it is set up to point to an Acoustic Model trainor (like HTK ...?) - is that what it is for?
We want to provide the a way to train certain words alone. For example
when we have the word "sample" and it isn't included in a training
text, we probably want to train just this single word.
So we can put together a custom "training text" with just the words that we select.
Try to fill the training list (top, center) with a few words and hit the "Train" button.
> Train
> Seems like a way to prompt users to record sentences ? using
either their own text that they import or some predefined text. But I
am not clear how this is to be used to
> update the acoustic models ? since there is no sentence repository GUI front-end, as there is for Word Adds.
What to you mean with "sentence repository GUI front-end"?
We can import texts and even put together our custom training model.
Then we record the needed utterances and train the language model with
the collected data.
The texts are stored in a xml format. You can find one sampletext in trunk/texts/.
> Implement
> This seems like where you select the programs that Simon will be sending recognized commands to.
You mean "Ausführen"?
In that dialog we collect all the commands for the user to see. This are the commands that simon knows and react on.
Please notice that we have a "magic word" which needs to be put in
front of it. ATM this magic word is hardcoded to "simon"
Like this: "Google" will do nothing. "simon Google" will open Google.
Thanks for your help!
--bedahr
--- (Edited on 4/19/2007 11:25 am [GMT-0400] by kmaclean) ---
--- (Edited on 4/19/2007 11:26 am [GMT-0400] by kmaclean) ---
--- (Edited on 4/20/2007 1:40 pm [GMT-0400] by kmaclean) ---
--- (Edited on 4/20/2007 1:41 pm [GMT-0400] by kmaclean) ---
--- (Edited on 4/20/2007 1:41 pm [GMT-0400] by kmaclean) ---
--- (Edited on 4/20/2007 1:42 pm [GMT-0400] by kmaclean) ---