VoxForge
A Dialog Manager is one component of a Speech Recognition System.
Telephony and Command & Control Dialog Managers
A Dialog Manager used in Telephony applications (IVR - Interactive Voice Response), and in some desktop Command and Control Application, assigns meaning to the words recognized by the Speech Recognition Engine, determines how the utterance fits into the dialog spoken so far,and decides what to do next. It might need to retrieve information from an external source. If a response to the user is required, it will choose the words and phrases to be used in its response to the user, and transmit these to the Text-to-Speech System to speak the response to the user.
Dictation Dialog Manager
A Dictation Dialog Manager will typically take the words recognized by the Speech Recognition Engine and type out the corresponding text on your computer screen. It may also have some Command and Control elements, but these are usually limited to the types of commands typically used in a word processing program. It usually responds to the user using text (i.e. it might not use Text to Speech to respond to the user).
Examples
Examples of Telephony Dialog Managers include:
Examples of Command & Control Dialog Managers:
Examples of Dictation Dialog Managers, with Command & Control elements, would be:
You can also write a domain specific application to perform Dialog Manager-like tasks using a traditional programming language (C, C++, Java, etc.) or a scripting Language (Perl, Python, Ruby, etc.). For example:
Here is video that describes an approach (Linux – remote controll with a voice) that uses Voximp as the dialog manager (which uses pocketsphinx), xbindkeys to bind program to a key and zenity to display notifications.
From the Voximp home page:
Voximp is an application which allows simple voice commands to be bound to spawn programs or simulate key/mouse presses. It's written in python and uses pocketsphinx for voice-recognition.
From the xbindkeys web page:
xbindkeys is a program that allows you to launch shell commands with your keyboard or your mouse under X Window. It links commands to keys or mouse buttons, using a configuration file. It's independant of the window manager and can capture all keyboard keys (ex: Power, Wake...).
From the zenity web page
Zenity is a tool that allows you to display Gtk+ dialog boxes from the command line and through shell scripts. It is similar to gdialog, but is intended to be saner. It comes from the same family as dialog, Xdialog, and cdialog, but it surpasses those projects by having a cooler name.
Here is an article (Google translated from Russian) that gives another example of using Julius with Python:
$ vi sample.voca
% NS_B
<s> sil
% NS_E
</s> sil
% ID
DO d uw
% COMMAND
PLAY pl ey
NEXT n eh kst
PREV pr iy v
SILENCE s ay l ax ns
$ vi sample.grammar
S: NS_B ID COMMAND NS_E
Create your grammar:
$ mkdfa sample
test with Julius
$ julius -input mic -C julian.jconf
$ vi command.py
def parse(line):
params = [param.lower() for param in line.split() if param]
commands = {
'play': 'audacious2 -p',
'silence': 'audacious2 -u',
'next': 'audacious2 -f',
'prev': 'audacious2 -r',
}
if params[1] in commands: os.popen(commands[params[1]])
Run as follows:
$ julius -quiet -input mic -C julian.jconf 2>/dev/null | ./command.py