Intel Audio-Visual Speech Recognition

Speech Recognition Engines

Flat

User: kmaclean
Date: 1/10/2007 9:16 am

Views: 10964
Rating: 12

Another possible use of the VoxForge Acoustic Models:

From the Intel Audio-Visual Speech Recognition site:

The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest in the study of audio-visual speech recognition (AVSR) systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generation and the need for features that are invariant to acoustic noise perturbation. The speaker independent audio-visual continuous speech recognition system relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region. Further, the visual and acoustic observation sequences are integrated using a coupled hidden Markov model (CHMM). The statistical properties of the CHMM can model the audio and visual state asynchrony while preserving their natural correlation over time. The experimental results show that the current system tested on the XM2VTS database (295 speakers) reduces by over 55% the error rate of the audio only speech recognition system at SNR of 0db

Open source code for AVCSR can be downloaded from http://sourceforge.net/projects/opencvlibrary/.

--- (Edited on 1/10/2007 10:16 am [GMT-0500] by kmaclean) ---

--- (Edited on 6/8/2015 9:40 am [GMT-0400] by kmaclean) ---