- 32-bit Fedora 9 Sun Java FireFox Plugin Installation
- 64-bit Fedora 9 Sun Java FireFox Plugin Installation
- Audacity seems let me record at higher rates that my sound card supports
- Free Long-Distance Options for Submitting Audio by Telephone
- FTP Clients that are compatible with VoxForge Submission System
- How can we get the phoneme in speech recognition
- How do I Adjust my Microphone for Recording Speech?
- How do I Pronounce a Word I Don't Know?
- How to compile Julius/Julian from source
- How to Connect to VoxForge FTP Site using FireFTP
- How to Create a Quiet Environment for Recording Prompts?
- How to Provide Feedback on a Submitted Audio File
- How to Receive Email Notication of a New Post on a Forum
- How-to Rate an Audio Submission to VoxForge
- Licensing of Public Domain Audio Books (LibriVox)
- Linux: How do I Adjust my Recording Volume Levels Using Audacity?
- Linux: How do I tar my audio files and prompts for submission to VoxForge
- Linux: how to adjust your microphone volume using GNOME
- Linux: how to adjust your microphone volume using KDE
- Linux: How to Change your Audacity Preferences to Record VoxForge Speech Audio
- Linux: How to determine your audio card's, or USB mic's, maximum sampling rate
- Posts: Nested and Flat Layouts
- Project Gutenberg and Librivox Copyright Status
- Speech Submission Mirrors
- Speech Submission: the Upload Link does not Appear in my Browser
- Tips for Recording VoxForge Prompts with Audacity
- What are Sampling Rate and Bits per Sample?
- What is a Desktop Command and Control Application?
- What is a Dialog Manager?
- What is a Dictation Application?
32-bit Fedora 9 Sun Java FireFox Plugin Installation
See this post (many thanks to the author: scott_glaser):- Sun Java Installation - i386 (FC8-9)
64-bit Fedora 9 Sun Java FireFox Plugin Installation
(alternate title #1: How-to install Sun Java on 64-bit Fedora 9 so that signed applets will work)
(alternate title #2: How-to install 32-bit FireFox on 64-bit Fedora 9 so that so that Sun Java applet will run properly in FireFox)
The information presented here was taken from these posts (many thanks to the authors: scott_glaser & natousayni):Problem:
- 64-bit Fedora 9 contains OpenJDK, which cannot run signed applets and the the VoxForge Speech Submission applet is a signed Java applet.
OpenJDK is the free implementation of Sun's Java run-time environment. The browser plugin used in Fedora 9, gcjwebplugin, does not yet support signed plugins. From the Fedora Project Wiki: :
Handling Java Applets
Upstream OpenJDK does not provide a plugin. The Fedora OpenJDK packages include an adaptation of gcjwebplugin, that runs untrusted applets safely in a Web browser. The plugin is packaged as java-1.6.0-openjdk-plugin.
- ...
- The gcjwebplugin adaptation does not support signed applets. Signed applets will run in untrusted mode. Experimental support for signed applets is present in the IcedTea repository, but it is not ready for deployment in Fedora.
- The gcjwebplugin security policy may be too restrictive. To enable restricted applets, run the firefox -g command in a terminal window to see what is being restricted, and then grant the restricted permission in the /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre/lib/security/java.policy file.
- Sun recommends 32-bit Java to run applets;
- 32-bit Java needs a 32-bit plugin to work in a browser;
- The 64-bit Fedora 9 implementation of FireFox is 64-bit and will not work with a 32-bit Java plugin.
Solution:
- Install 32-bit FireFox
* Add i386 Yum repository
* create a new Yum configuration file:
# gedit /etc/yum.repos.d/fedora-i386.repo
* copy these settings into the new configuration file:
[fedora-i386]
name=Fedora $releasever - i386
failovermethod=priority
baseurl=http://download.fedora.redhat.com/pub/fedora/linux/releases/$releasever/Everything/i386/os/
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-$releasever&arch=i386
enabled=1
gpgcheck=1
includepkgs=firefox
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora
#
[updates-i386]
name=Fedora $releasever - i386 - Updates
failovermethod=priority
baseurl=http://download.fedora.redhat.com/pub/fedora/linux/updates/$releasever/i386/
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=updates-released-f$releasever&arch=i386
enabled=1
gpgcheck=1
includepkgs=firefox
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora* Remove the default Firefox (64-bit) installation
# yum -y erase firefox.x86_64
* Install 32-bit Firefox
# yum -y install firefox.i386
- Add libXtst.i386 library
# yum -y install libXtst.i386
- Don't touch your default java installation
Other how-tos state that you need to remove openjdk. However, other programs on Fedora 9, like Eclipse, need the default Java. You should not have to remove your default Java installation (java-1.6.0-openjdk java-1.0.6-openjdk-plugin) because you can use the alternatives command to select the version of Java you need (but you need to make sure you don't use the rpm version of Sun's Java install, because it changes the /usr/bin/java executable to not point to the "alternatives" command symbolic links ).
- Create new Java directory
# mkdir /usr/java
# cd /usr/java
- Download Sun's java your new Java directory
www.java.com/en/download
(the .bin file NOT the rpm.bin - because the rpm changes the /usr/bin/java executable to not point to the "alternatives" command symbolic links).
- Execute the bin
# chmod +x jre*
# ./jre-6u7-linux-i586.bin
- Link FireFox plugins to the new Sun Java
- Manually
* for a particular user# cd /home/yourusername/.mozilla/plugins# ln-s /usr/java/jre-6u7-linux-i586/plugins/i386/ns7/libjavaplugin_oji.so* for all users:# cd /usr/lib/mozilla/plugins# ln-s /usr/java/jre-6u7-linux-i586/plugins/i386/ns7/libjavaplugin_oji.so- OR -
- Use alternatives command (for all users)
(you can also use the alternatives command to set the plugin link - since FireFox is the only app that needs the Sun Java and it uses libjavaplugin.so rather than the libjavaplugin.so.x86_64 used by other programs (like Eclipse) on 64-bit Fedora 9).# /usr/sbin/alternatives --install /usr/lib/mozilla/plugins/libjavaplugin.so libjavaplugin.so /usr/java/jdk1.6.0_07/jre/plugin/i386/ns7/usr/java/libjavaplugin.so 2
# /usr/sbin/alternatives --config libjavaplugin.so
There is 1 program which provide 'libjavaplugin.so'.
Selection Command
-----------------------------------------------
*+ 1 /usr/java/jre1.6.0_07/plugin/i386/ns7/libjavaplugin_oji.so
Enter to keep the current selection[+], or type selection number: 1
Audacity seems let me record at higher rates that my sound card supports
Audacity will let you change your Sample Rate and Bits per Sample to rates higher than what your Sound Card can support. It will record at the highest rate your sound card supports, and then dynamically upsample the audio to the higher rate your selected - without providing any warnings that it did this. This is NOT the approach you should take for any audio submitted to VoxForge. The upsampling and later downsampling for use in Acoustic Models can introduce noise.
Please check your audio card manual to determine the highest Sampling Rate it supports. If you don't have your manual (or lost it, or never received one ...) these other FAQ entries can help you determine your max sampling rate:
Windows: How to determine your audio card's, or USB mic's, maximum sampling rate
Linux: How to determine your audio card's, or USB mic's, maximum sampling rate
Free Long-Distance Options for Submitting Audio by Telephone
US
- Telephone-to-Telephone, website setup
- ViaTalk Free Connect allows 10 minutes of free long-distance talk time.
- PC-to-Telephone
- iCall - Windows only; need to download their client
- ThePudding (beta) - Free calling in North America, uses speech recognition to display ads on your screen that are related to the conversation.
thanks,
Ken
FTP Clients that are compatible with VoxForge Submission System
- Cross-platform FTP client:
- FireFTP - requires Firefox 1.5 or greater
- Linux:
- Nautilus (Gnome)
- Windows:
- Mac
How can we get the phoneme in speech recognition
Hi,I am trying to get Sphinx to work with Indian languages and thinking of working with phone set for one language at a time.
What is the best approach to do this in Sphinx. Can I directly work with phoneme decoder so that I do not have to get into Vocabulary and word dictionary.
Does Sphinx do phoneme only output and is the recognition real time.
for instance if I say "Hello" how can i get a phoneme which recognize 'h' first then 'e' then 'l' then 'l' then 'o' each independently?
Thanks.
How do I Adjust my Microphone for Recording Speech?
If you have a headset microphone, this should be easy to do. Your microphone should be a bit to the side and below your mouth (so the microphone won't pick-up your breathing), and no more than a half inch (1-2 cm) away.
A standalone microphone makes recording a little more difficult. It is very important to keep your mouth at the same distance from the microphone for the entire duration of the recording of one file. The same applies to a handheld microphone, try to be consistent in the way you hold it when recording the prompts for one file. It does not have to be the same distance from one file to another, but must be the same for the duration of one file.How do I Pronounce a Word I Don't Know?
There are a couple of approaches
1. Use the VoxForge Dictionary:
If you are wondering about pronunciations, the VoxForge Dictionary might provide you with some indication as to the pronunciation. For example, the word "etc" shows up as follows in the dictionary:
ETC [ETC] eh t s eh dx er ax
ETCETERA [ETCETERA] eh t s eh dx er ax
You really don't need to know how the phonemes are pronounced in this particular example, because you can see that 'ETC' and 'ETCETERA' contain the same phonemes, and therefore should be pronounced the same.
For other words you are not sure how to pronounce, you can look at their component phonemes and search for similar strings of phonemes until you find a word you know how to pronounce.
For example, for the word "windward", you would look it up in the dictionary and find:
WINDWARD [WINDWARD] w ih n d w er d
You would then search for the string "w er d" and find the word "word"
WORD [WORD] w er d
So now you know you would pronounce the word windward as "wind" + "word".
Note that this is not clearcut in all instances, because some dialects pronounce the "ward" in the word "windward" like the "ward" in the word "award", see this dictionary entry:
AWARD [AWARD] ax w ao r d
Therefore, it all depends on the target users of the speech recognition system and what their own particular dialect is. And if we are targeting an Acoustic Model to this particular dialect, we might add an entry to the dictionary like this:
WINDWARD [WINDWARD] w ih n d w ao r d
But in the non-native speaker case, where you might not have any idea how to pronounce a word, the dictionary is a good start.
2. Listen to Someone Else's Audio
Another approach might be to listen to the audio from someone else's submission to see how they pronounce it.
3. Other Resources
- LibriVox discussion re: Pronunciation Resources mentions the following resources:
How to compile Julius/Julian from source
Step 1 - Download Source Code
Create a new directory in your home directory called 'bin', it
should have the following path (replace yourusername with the username
you are using on your system):
- /home/yourusename/bin
click the following link:
and save it to your new bin directory.
Extract the file using:
- Nautilus (right click the tar/gzipped file and click extract here)
- use tar from the command line:
- tar -xvzf julius-3.5.2.tar.gz
Step 2 - Compile & Install Julius
After unpacking the sources, open a command line terminal and go to the /hom/yourusername/bin/julius-3.5.2 directory where you downloaded your files.
configure
The default location for binaries is "/usr/local" which will put the tools in "/usr/local/bin". You need to change this default location using the "./configure" script to specify where you want the binaries installed:
To compile Julius:
|
$./configure --prefix=/home/yourusername/bin/julius-3.5.2 |
To compile Julian:
|
$./configure --prefix=/home/yourusername/bin/julius-3.5.2 --enable-julian |
This directs the make command to put all your binaries in the following folder:
- /home/yourusername/bin/julius-3.5.2/bin
make
To build the libraries and binaries, execute the following:
|
$make all |
Running the following command will install them:
|
$make install |
Step 3 Update your User Path
To update your user path, you need to add the '/home/yourusername/bin/julius-3.5.2/bin' path to your path variable. To do this, edit your '.bash_profile' file in your home directory (in Fedora you need to show 'hidden files' in Nautilus - so you can display file names with a period in front of them) and add a colon (":") and this path to the end of the PATH variable (leaving the rest of it unchanged):
| # User specific environment and startup programs PATH=$PATH:$HOME/bin:/usr/yourusername/julius/julius-3.5.2-linuxbin/bin |
Log out and log back in to make your path change effective.
How to Connect to VoxForge FTP Site using FireFTP
This how-to assumes you already have FireFTP installed in your FireFox browser. You can download FireFTP from here (note: it requires Firefox 1.5 or greater).
Using Right-click Menu
- Right-click the VoxForge FTP link;
- click "Open Link in FireFTP" in the menu window;
- Enter the password when prompted (the Login section should fill in automatically);
- Click "OK".
Connect to VoxForge FTP site from FireFTP
From FireFox:
- click tools, and
- then select FireFTP.
Inside the FireFTP tab:
- Click "Manage Accounts",
- then select "QuickConnect".
- In the Account Manager po-up window, enter the following information where indicated:
- Host
- Login
- Password
- Click "OK"
How to Create a Quiet Environment for Recording Prompts?
Before you begin, you need to make sure that the room you are recording in is as quiet as possible. You do not need an acoustically sound-proof room, but you need to make sure that, while you are recording, there are no external noises that your microphone might pick up and which may render you speech audio files unusable for Acoustic Model creation purposes. Use common sense: you should not have any music in the background, no fans, air conditioners, microwaves, television, etc. In addition, make sure you turn off you speakers while recording - to avoid acoustic feedback in your audio files.
This Discussion Thread provides more ideas.
How to Provide Feedback on a Submitted Audio File
Once you have posted some audio it will be posted on the Voxforge download page.
Others may then download your submission and provide feedback on the recording. This will help in classifying your speech audio submission for later merging into the VoxForge Acoustic Models.
There are two ways to provide feedback for a submission:
- rate the submission - click the 'thumbs up' icon or the 'thumbs down' icon at the top right corner of the submission; or
- written feedback - if you want to provide written feedback to a submission, you can click 'reply' at the bottom of the submission to add your comments.
How to Receive Email Notication of a New Post on a Forum
To receive an email notification of any new user submissions to a forum:-
click on the title of a forum, a list of threads appears; next
-
click the subscribe link at the very top of the thread list page.
You need to be logged in for the subscribe link to appear
-
You need to subscribe to each forum if you want all of them - there is no 'master subscribe'.
-
When you post to a forum without logging in (as a 'visitor'), there will be a lag of a few minutes before your post will display (you need to refresh your browser for the change to display). There is no such delay if you register and log in to the system.
How-to Rate an Audio Submission to VoxForge
VoxForge is not looking for TV or radio announcer quality voices (just listen to my voice recordings ...) or perfect audio quality.
For Free and Open Source Speech Recognition to work, we need a large variety of speech (from different people, with different dialects/accents, and using different prompts files with various phonemes and triphones) recorded in a variety of environments (rooms with echo, such as hardwood floors or tiles, and rooms with no echo, such as carpet, etc.) and on a variety of recording equipment (headset mics, desktop mics, built in mics, and USB mics, integrated audio, audio cards ...).
That is not to say that you should not try to minimize non-speech noise in your audio submissions, it just that the submissions we are looking for should reflect the environments where the acoustic model might be used for speech recognition.
Therefore, most audio submitted to the VoxForge site should receive a thumbs up. This is because it takes some effort to create a recording (when you are first starting out), and new submitters should be encouraged, not discouraged. A "thumbs up" rating would go a long way to encouraging submissions.
What should result in a thumbs down is when a transcription doesn't match its corresponding audio or when there is excessive background noise (i.e. non-speech noise or talking in the background). What is "excessive noise" is subjective, since the Acoustic Model creation process can tolerate some low level hiss and/or hum (usually heard in quiet periods of some recordings). But if enough people submit their rating of a submission, on average we should get a good view of the quality of a recording for use in the creation of acoustic models.
Licensing of Public Domain Audio Books (LibriVox)
VoxForge uses Public Domain Audio Books (LibriVox) recordings to create a derivative works which will be licensed under the GPL (with all applicable rights held by the Free Software Foundation).
This will not affect the legal status of the recording you submitted here or anywhere else (e.g. LibriVox). Therefore you will still be able to put your recording into the public domain (or it will remain in the public domain if it's already in there).
Linux: How do I Adjust my Recording Volume Levels Using Audacity?
First make sure your microphone volume in Audacity is set to 1.0. Then click Record (i.e. the red circle button) and begin speaking in your normal voice for a few seconds, and then click Stop (i.e. the yellow square button).Look at the Waveform Display for the audio track you just created (see image below). The Vertical Ruler to the left of the Waveform Display provides your with a guide to the audio levels. Try to keep your recording levels between 0.5 and -0.5, averaging around 0.3 to -0.3. It is OK to have a few spikes go outside the 0.5 to -0.5 range, but avoid having any go beyond the 1.0 to -1.0 range, as this will generate distortion (see image):
If your Sound Level is too Low
If you have increased your volume to the maximum and still are not getting an acceptable sound level, you may need to turn on the 'Mic Boost' switch in your Linux mixer. Fedora's mixer (i.e. gnome-volume-control) is the "Volume Control" utility located in Applications>Sound & Video>Volume Control menu. Select the 'Switches' tab of the Volume Control utility and then select 'Mic Boost (+20dB)' (see image below):
Note:
Audacity's microphone volume control overrides any other microphone
volume settings you may have in your Linux mixer (i.e. in the Capture
tab).
Hit the ctrl-z key in Audacity (to 'undo' your previous recording) and try recording again.
If your Sound Level is too High
If the waveform display on your track beyond the 1.0 to -1.0 range (i.e. the waveforms have been clipped off at the top or bottom) your volume is too high. Reduce it with Audacity's microphone volume control, and hit ctrl-z in Audacity and try again. It is better to err on the side of having a lower volume level from a speech recognition perspective - clipped speech sounds distorted.
Once you are satisfied that the volume is acceptable, try playing the file back by clicking Play (i.e. the green triangle button) in Audacity. You will likely need to adjust the Master Volume and the PCM Volume sliders for your speakers under the 'Playback' tab in your Volume Control utility, see image:
Note:
Your Audacity Volume control slider and your mixer's PCM Volume Control
slider move in tandem - i.e. moving one will move the other. But
you may still need to adjust your Master Volume control in your mixer
to hear sound from your speakers.
You need to hear your utterances after each recording to make sure they sound OK - but make sure that your speakers are turned off when you are recording. Hit ctrl-z in Audacity to remove the track you just created.
Linux: How do I tar my audio files and prompts for submission to VoxForge
Please create a single compressed tar file containing the following files:- your prompts file;
- your audio files (in "wav" format);
- your README and LICENSE files.
Name your tar file as follows "[voxforge username]-[year][month][day].tgz" . For example, if you stored all these files in the /home/myusername/train folder, you would execute the following command to create your gzipped tar file:
|
$cd /home/myusername |
Linux: how to adjust your microphone volume using GNOME
To set your microphone volume in Linux, you need use your distro's mixer. To start the Gnome mixer, select:
System>Preferences>Volume Control
and then click the Capture tab:
Move the sliders up or down to increase or decrease your microphone's recording volume.
Determining optimal microphone volume settings
First make sure your microphone slider is set to it's mid-point. Then click Record in the VoxForge Speech Submission Application and begin speaking in your normal voice for a few seconds, and then click Stop.
Look at the Waveform Display for the recording you just created. Adjust your microphone volume up or down depending on the size of the Waveforms.
If your Sound Level is too Low
If you have increased your volume to the maximum and still are not getting an acceptable sound level, you may need to either increase the volume settings or turn on the 'Mic Boost' switch in your Linux mixer. Select the 'Switches' tab of the Volume Control utility and then select 'Mic Boost (+20dB)' (see image below):
Try re-recording some speech - you might have to reduce your microphone volume to compensate for the Mic Boost.
If your Sound Level is too High
If the waveforms in the display have been clipped off at the top or bottom, then your volume is too high. Reduce your microphone volume, and re-record some speech. It is better to err on the side of having a lower volume level from a speech recognition perspective - clipped speech sounds distorted. But you also need it to be loud enough such that you can see your speech waveforms in the display (i.e. you should be able to see squiggly lines that correspond to your speech).
Adjusting your Playback Volume
Once you are satisfied that the volume is acceptable, try playing the file back by clicking Play . You will likely need to adjust the Master Volume and the PCM Volume sliders for your speakers in your Volume Control utility, see image:
Linux: how to adjust your microphone volume using KDE
To set your microphone volume using Linux, you need use your distro's mixer. To start the KDE mixer (which is included with the "kdemultimedia" package), select:
System>Multimedia>KMix
and then click the Input tab:
Move the "Mic" slider up or down to increase or decrease your microphone's recording volume.
Determining optimal microphone volume settings
First make sure your microphone slider is set to it's mid-point. Then click Record in the VoxForge Speech Submission Application and begin speaking in your normal voice for a few seconds, and then click Stop.
Look at the Waveform Display for the recording you just created. Adjust your microphone volume up or down depending on the size of the Waveforms.
If your Sound Level is too Low
If you have increased your volume to the maximum and still are not getting an acceptable sound level, you may need to either increase the volume settings or turn on the 'Mic Boost' switch in your Linux mixer. Select the 'Switches' tab of the KMix utility and then select 'Mic Boost (+20dB)' (see image below):
Try re-recording some speech - you might have to reduce your microphone volume to compensate for the Mic Boost.
If your Sound Level is too High
If the waveforms in the display have been clipped off at the top or bottom, then your volume is too high. Reduce your microphone volume, and re-record some speech. It is better to err on the side of having a lower volume level from a speech recognition perspective - clipped speech sounds distorted. But you also need it to be loud enough such that you can see your speech waveforms in the display (i.e. you should be able to see "squiggly" lines that correspond to your speech).
Adjusting your Playback Volume
Once you are satisfied that the volume is acceptable, try playing the file back by clicking Play. You might need to adjust the Master Volume and the PCM Volume sliders for your speakers in under the KMix "Output" tab.
Linux: How to Change your Audacity Preferences to Record VoxForge Speech Audio
VoxForge collects speech audio at the highest Sample Rate that your Sound Card can support (up to a Sampling Rate of 48kHz, at 16 Bits Per Sample). You'll need to look at your Sound Card's manual to determine the maximum it supports (see this FAQ entry for more info on your sound card and recording rates). For this example we will assume a 48kHz Sample Rate.Project Sampling rate
In Audacity, you set the Project Sampling Rate in your Preferences. First go to 'File', then select 'Preferences...', next click the 'Quality' tab, and then set your 'Default Sample Rate Format' by clicking the up/down arrows to change it to 48000Hz (the default is usually 44100Hz), see image below:
Sample Rate Format
Still in the 'Preferences...' menu, and still under the 'Quality' tab,
click the 'Default Sample Format'. Click the up/down arrows
to change it to 16-bit, see image above.
Channels
While still in the 'Preferences...' menu, click the 'Audio I/O' tab, and then set your 'Channels' to 1 (Mono), see image below:

Export File Format
While still in the 'Preferences...' menu, click the 'File Formats' tab, and then set your 'Uncompressed Export Format' to WAV (Microsoft 16 bit PCM), see image below:
You can also submit speech using FLAC format.
| Note: Please only submit audio files in an uncompressed format such as WAV or AIFF or lossless compressed format such as FLAC. |
Click OK to save your settings.
Making your settings active
Now you need to exit and re-start Audacity to make these Project Setting changes active. In Audacity, click File>Exit. Restart Audacity by clicking Applications>Sound & Video>Audacity.
Look at Project rate selector on the bottom left hand corner of the Audacity window, make sure it says 48000. If it does, then you are ready to continue. If not, then re-check your Preferences tab to make sure your settings are correct.
Linux: How to determine your audio card's, or USB mic's, maximum sampling rate
To submit audio to VoxForge, you need to make sure you Sound Card and your Device driver both support a 48kHz sampling at 16 bits per sample.
You can use arecord, the command-line sound recorder (and player) for the ALSA
sound-card driver. It should be included with your Linux
distribution (type in "man arecord" at the command line to confirm
this).
The approach here is use the 'arecord' command to try to record your speech at a sampling rate higher than what your sound card supports. arecord balks at this and will return an error message stating the maximum rate your sound card or usb mic can give you. Details of this approach can be found near the end of this thread (go to the second page). Many thanks to Robin for helping out on this one.
1. Sound Card or Integrated Audio
If you have a sound card or audio processing integrated into your motherboard, get a list of all the audio devices on your PC by executing this command:
$arecord --list-devicesYou should get output similar to this:
**** List of CAPTURE Hardware Devices ****
card 0: IXP [ATI IXP], device 0: ATI IXP AC97 [ATI IXP AC97]
Subdevices: 1/1
Subdevice #0: subdevice #0
This says that my integrated audio card is on card 0, device 0.
Next, try to record your speech at a rate higher than what you think your highest recording rate might be (replacing the numbers in hw:0,0 with your card and device number):
$ arecord -f dat -r 60000 -D hw:0,0 -d 5 test.wavThe 60000 corresponds to a sampling rate of 60kHz. Your output should look something like this:
Recording WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 60000 Hz, Stereo
Warning: rate is not accurate (requested = 60000Hz, got = 48000Hz)
please, try the plug plugin (-Dplug:hw:0,0)
Aborted by signal Interrupt...
This tells us that the maximum sampling rate supported on my integrated audio card is 48000Hz (or 48kHz). You may have to experiment with different sampling rates to get the Warning message.
2. USB Microphone or USB audio pod
If you have USB based audio, first get a list of all the audio devices on your PC using this command:
$ arecord --list-devicesYou should get a listing similar to this:
[...]
card 1: default [Samson C01U ], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
This says that the USB microphone is is on card 1, device 0.
Next, try to record your speech at a rate higher than what you think your highest recording rate might be (replacing the numbers in hw:1,0 with your card and device number):
$ arecord -f S16_LE -r 60000 -D hw:1,0 -d 5 testS16_LE.wav"S16_LE" means 'Signed 16 bit Little Endian'. This command will output something like this:
Recording WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 60000 Hz, Stereo
Warning: rate is not accurate (requested = 60000Hz, got = 48000Hz)
please, try the plug plugin (-Dplug:hw:0,0)
Aborted by signal Interrupt...
The arecord output tells us that the maximum sampling rate supported on my integrated audio card is 48000Hz (or 48kHz). You may have to experiment with different sampling rates to get the Warning message.
There is some additional information on USB mics on the Audacity site.
Posts: Nested and Flat Layouts
A message or post is the smallest unit in a discussion. A threaded discussion (or thread) is a series of posts related to the same topic or subject.
A post's layout can be set using the Nested/Flat link in a VoxForge forum (located on the top right hand corner of a thread) and can be:
- flat - posts appear one after another;
- nested - posts are grouped in tree-like structure. Messages are usually grouped visually in a hierarchy by topic. A set of messages grouped in this way is called a topic thread or simply "thread". New posts do not necessarily appear at the end of the discussion thread but appear under the post that is being replied;
Example of flat structure:
- post 1
- post 2
- post 3
- post 4
- ...
Nested structure:
- post 1
- reply to post 1
- another reply to post 1
- post 2
- reply
- reply
- ...
- reply
Flat/nested choice often determines the way people discuss. In the flat layout there is only one "path" - that is why it is sometimes called "linear".
Nested layout offers more freedom, more digressions and more paths in the discussion but it is more difficult to spot new posts (unless you use RSS feed or watch the thread).
Project Gutenberg and Librivox Copyright Status
LibriVox
Librivox audio is public domain (they use a Creative Commons Public Domain Dedication). They use ebooks obtained from Project Gutenberg. Many Project Gutenberg ebooks are also public domain (not all). To make sure that they only release audio readings of public domain texts, Librivox relies on Project Gutenberg's legal work to assure Copyright status of their books.
Project Gutenberg
The Acoustic Model creation process requires that we segment any user
submitted audio (and its corresponding text transcriptions) into 5-10
second speech audio snippets. But, in doing so, we would contravene
the Gutenberg Project's Trademark licensing terms if we kept any
references to Gutenberg in the eText that accompanies the speech
audio. For this reason, we need to remove all references to Gutenberg
in any speech audio and text submission made to VoxForge.
To distribute Project Gutenberg e-texts with the "Project Gutenberg" trademark name, you must follow some licensing provisions that include a requirement that the text not be broken up in any way, and pay a licensing fee. If you don't use the Project Gutenberg Name, and delete any reference to it in the text, you can distribute the text in any way you see fit.
For example, for the Herman Melville book 'Typee', the Librivox audio is public domain (it uses the Creative Commons Dedication). The text of the Gutenberg Typee ebook has the following "license". It says the following in the intro:
ABOUT PROJECT GUTENBERG-TM ETEXTS
This PROJECT GUTENBERG-tm etext, like most PROJECT GUTENBERG-
tm etexts, is a "public domain" work distributed by Professor
Michael S. Hart through the Project Gutenberg Association at
Carnegie-Mellon University (the "Project"). Among other
things, this means that no one owns a United States copyright
on or for this work, so the Project (and you!) can copy and
distribute it in the United States without permission and
without paying copyright royalties.
Then it goes on to say:
Special rules, set forth
below, apply if you wish to copy and distribute this etext
under the Project's "PROJECT GUTENBERG" trademark.
So basically it says that no one owns copyright on the written text of this book in the US (and likely most other jurisdictions), and you can copy and distribute as you please. But, if you want to copy and distribute the book along with references to the Gutenberg TradeMark, then you need to follow some special rules.
Further on in the document it says:
DISTRIBUTION UNDER "PROJECT GUTENBERG-tm"
You may distribute copies of this etext electronically, or by
disk, book or any other medium if you either delete this
"Small Print!" and all other references to Project Gutenberg,
This clarifies what you need to do if you want to distribute the ebook without any restrictions - basically you delete the 'license' and the Gutenberg trademarks.
It then goes on to elaborate the conditions you must follow if you do want to distribute the text with the Gutenberg trademarks:
or:
[1] Only give exact copies of it ...
[2] Honor the etext refund and replacement provisions of this
"Small Print!" statement.
[3] Pay a trademark license fee to the Project of 20% of the
net profits ...
Note: I am not a lawyer, and this is not a legal opinion.
Speech Submission Mirrors
Full Mirror of VoxForge site (thanks to Coral Cache):
Partial Mirrors (only VoxForge Speech Submission app & some supporting docs):
Speech Submission: the Upload Link does not Appear in my Browser
You need Javascript enabled on your browser in order for the upload link to appear on the submission page.Tips for Recording VoxForge Prompts with Audacity
The easiest way to record Voxforge Prompts with Audacity is to open the prompts file into its own browser window or tab. Then maximize your browser to take up all your screen.
Next, open Audacity into a smaller window - almost the same width as your browser but only 1/4 the height - see image below:
Use the top of the Audacity window as a ruler to highlight the line that you are reading. When you finish one line, use your mouse to move the Audacity window down one line.
When you get too close to the bottom of your screen, just scroll up the prompts file in your browser window, and continue recording your prompts.
What are Sampling Rate and Bits per Sample?
From the Audacity Digital Audio Tutorial :
The main device used in digital recording is a Analog-to-Digital Converter (ADC). The ADC captures a snapshot of the electric voltage on an audio line and represents it as a digital number that can be sent to a computer. By capturing the voltage thousands of times per second, you can get a very good approximation to the original audio signal:
Each dot in the figure above represents one audio sample. There are two factors that determine the quality of a digital recording:
Sample rate: The rate at which the samples are captured or played back, measured in Hertz (Hz), or samples per second. An audio CD has a sample rate of 44,100 Hz, often written as 44 KHz for short. This is also the default sample rate that Audacity uses, because audio CDs are so prevalent.
Sample format or sample size: Essentially this is the number of digits in the digital representation of each sample. Think of the sample rate as the horizontal precision of the digital waveform, and the sample format as the vertical precision. An audio CD has a precision of 16 bits, which corresponds to about 5 decimal digits.
Higher sampling rates allow a digital recording to accurately record higher frequencies of sound. The sampling rate should be at least twice the highest frequency you want to represent. Humans can't hear frequencies above about 20,000 Hz, so 44,100 Hz was chosen as the rate for audio CDs to just include all human frequencies. Sample rates of 96 and 192 KHz are starting to become more common, particularly in DVD-Audio, but many people honestly can't hear the difference.
Higher sample sizes allow for more dynamic range - louder louds and softer softs. If you are familiar with the decibel (dB) scale, the dynamic range on an audio CD is theoretically about 90 dB, but realistically signals that are -24 dB or more in volume are greatly reduced in quality. Audacity supports two additional sample sizes: 24-bit, which is commonly used in digital recording, and 32-bit float, which has almost infinite dynamic range, and only takes up twice as much storage as 16-bit samples.
Here are some additional articles that provide more information on sampling rate and bit depth (i.e. bits per sample):
- Discussion of the mysteries behind bit-depth, sample rates and sound quality
- Sample rate
and bit depth - an introduction to sampling
What is a Desktop Command and Control Application?
It typically refers to a capability of voice recognition systems on a personal computer that
lets you select menus and other functions by speaking the commands into
a microphone.
What is a Dialog Manager?
A Dialog Manager is one component of a Speech Recognition System.
Telephony and Command & Control Dialog Managers
A Dialog Manager used in Telephony applications (IVR - Interactive Voice Response), and in some desktop Command and Control Application, assigns meaning to the words recognized by the Speech Recognition Engine, determines how the utterance fits into the dialog spoken so far,and decides what to do next. It might need to retrieve information from an external source. If a response to the user is required, it will choose the words and phrases to be used in its response to the user, and transmit these to the Text-to-Speech System to speak the response to the user.
Dictation Dialog Manager
A Dictation Dialog Manager will typically take the words recognized by the Speech Recognition Engine and type out the corresponding text on your computer screen. It may also have some Command and Control elements, but these are usually limited to the types of commands typically used in a word processing program. It usually responds to the user using text (i.e. it might not use Text to Speech to respond to the user).
Examples
Examples of Telephony Dialog Managers include: An example of a Command & Control Dialog Manager:- Gnome-Voice-Control (uses PocketSphinx)
- SpeechLion (uses Sphinx4)
- PerlBox (uses Sphinx2)
- Simon (uses Julius)
- xVoice (needs IBM's ViaVoice engine for Linux - no longer available)
- Evaldictator (uses Sphinx4)
You can also write a domain specific application to perform Dialog Manager-like
tasks using a traditional programming language (C, C++, Java, etc.)
or a scripting Language (Perl, Python, Ruby, etc.).
What is a Dictation Application?
A Dictation application uses Speech Recognition to translate your speech into written text on your computer.
A Dictation application lets you speak into a microphone attached to your computer, and have the text print out on your computer screen. It can recognize a larger number and variety of words. It can recognize arbitrary phrases with words in any order.
This is different from a Command and Control application which also uses speech recognition but is limited to controlling your computer and software applications by speaking short commands. Here, the vocabulary that the speech recognition engine can recognize is much smaller than in dictation, and is limited to a small set of words and predefined phrases.
Commercial dictation systems usually include a command and control system.