VoxForge
My purpose of using speach recognition is to recognise one sentence, sending the result to another programfile and then exit.
I would like to call julian using a systemcommand or some equal from a c-file (OS is linux) and then recognise one sentence, which result is sent to the c-file or saved to a textfile I can load from the c-file afterwards. I am a litte unsure about how to do this, and need help.
My surgestion is to generate the grammar and record a voice-file with a matching sentence. Then execute julian from the c-file and make it listen to the voicefile to adjust input. Then it should record voice from a mic and try to recognise input. After this it should send the result or write it to a textfile and then exit..
How can I do this..?
- Grant
P.s. Sorry if it may have doubleposted this question, it did not seem to work the first time..
--- (Edited on 3/28/2007 4:33 am [GMT-0500] by Visitor) ---
Hi Grant,
The current VoxForge Acoustic model is only geared to Grammar based speech recognition, not dictation based recognition (which uses Language Models). So the sentence that you want to recognize must be set out in the acceptable words and phrases in your Grammar file. Step 1 of the VoxForge Acoustic Model Creation tutorial will give you some background on creating Grammars for you particular task. Note that the phonemes used in the Tutorial example are slightly different than those used in the VoxForge Acoustic Model - so if you are using the VoxForge Acoustic Model for recognition, be sure to use the pronunciations used in the VoxForge Lexicon.
The should be able to run Julian from a system call from a scripting language (or a C program), and have the results returned to stdin (in Perl you can use the 'system' command). See the '-input rawfile' and '-filelist file' parameters in Julian, to specify the file you want to recognize speech from. You can use a Julian config file to reduce the number of parameters you need in 'system' command, and run Julian in '-quiet' mode to reduce the amount of output it generates.
Another way would be to run Julian in server mode (look at the section called ' Server Module Mode '), and use the adintool as the 'client'. The adintool can take live audio, or a speech audio file, from one computer, and transmit it to another computer (or to a specific port on the same computer) to the Julian server. You then use a different port to send commands and receive results to the Julian server. Just create a loop to continuously monitor the port and perform actions depending on the recognition results.
Hope this helps,
Ken
--- (Edited on 3/28/2007 12:10 pm [GMT-0400] by kmaclean) ---
Ah, I see.. Thank you, but I have a few more questions then.
My first question was maybe not so clear. What I asked for was whether it is possible to avoid the first incomming speech to be used to identify and adjust the level of the incomming voice. I mean can I make a file which julian uses to do this and then interpret directly from the first incomming voice..? I am aware that this can result in worse interpretations, but in the same room and with the same persons voice maybe it might not do the big difference..?
stdin, of cause, but I am a little in doubt about which and how much data is returned from julian. But the silence mode will hopefully decrease the amount of data.. I'll find out..
Anyway, I think the most optimal choice for me is to use servermode, so I can make an infinity-loop while the program is running and continiously listen to possible incomming voice. And if I do this I'm a little unsure about how to only send/save interpretted data as described a little in adintools. The command should be "% adintool -in mic -out file -filename data" but how does it know where to find and use the grammar I have created for julian, I do not see any possible input for this in the synopsis..? To interpret one legal sentence (grammar related) is the optimal choice "-oneshot" (Record only the first speech segment), or is this the right interpretation of segment=sentence..?
- Grant
--- (Edited on 4/ 2/2007 4:30 am [GMT-0500] by Visitor) ---
Hi Grant,
>What I asked for was whether it is possible to avoid the first incomming speech to be used to identify and adjust the level of the incomming voice
You need to create a cmn file and load it when you start up Julian - see this post for some info, and the Julian manual under the "-cmnsave filename" and "-cmnload filename" parameters.
>I'm a little unsure about how to only send/save interpretted data as described a little in adintools. The command should be "% adintool -in mic -out file -filename data" but how does it know where to find and use the grammar I have created for julian, I do not see any possible input for this in the synopsis..?
adintool is only an audio client that transmits the speech audio to the Julian server (i.e. it's like a soft-phone). Julian is the server and jcontrol lets you send commands and receive recognition results from the Julian server. Usually, you want adintool to run continuously, and let Julian recognize the speech, and your program loop monitors the output from jcontrol.
You might be better off using the regular version of Julian to start off with, before moving to server mode.
Hope that helps,
Ken
--- (Edited on 4/ 2/2007 2:07 pm [GMT-0400] by kmaclean) ---