Decoding visemes: Improving machine lip-reading

Published 2016 in IEEE International Conference on Acoustics, Speech, and Signal Processing

ABSTRACT

To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses previously trained visemes in the first pass. With our new training algorithm, we show classification performance which significantly improves on previous lip-reading results.

PUBLICATION RECORD

Publication year
2016
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Publication date
2016-03-20
Fields of study
Computer Science
Identifiers
DOI 10.1109/ICASSP.2016.7472029 arXiv 1710.01288
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Confusion modelling for lip-reading
2015cited by this paper
Finding phonemes: improving machine lip-reading
2015influential reference
Analysing the importance of different visual feature coefficients
2015cited by this paper
TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech
2015cited by this paper
Speaker-independent machine lip-reading with speaker-dependent viseme classifiers
2015cited by this paper
Re-id: Hunting Attributes in the Wild
2014cited by this paper
Inverse Compositional Algorithm
2014influential reference
Some observations on computer lip-reading: moving from the dream to the reality
2014cited by this paper
Resolution limits on visual speech recognition
2014cited by this paper
A review of recent advances in visual speech decoding
2014cited by this paper
The effect of speaking rate on audio and visual speech
2014cited by this paper
Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading?
2014cited by this paper
Unsupervised Random Forest Manifold Alignment for Lipreading
2013influential reference
AVAS: Speech database for multimodal recognition applications
2013influential reference
Confusion modelling for automated lip-reading usingweighted finite-state transducers
2013cited by this paper
Recent developments in automated lip-reading
2013cited by this paper
Dynamic units of visual speech
2012cited by this paper
Insights into machine lip reading
2012influential reference
Phoneme-to-viseme Mapping for Visual Speech Recognition
2012influential reference
View Independent Computer Lip-Reading
2012cited by this paper
A fast, robust and low bit-rate representation for SIFT and SURF features
2011cited by this paper
A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities
2011cited by this paper
Local binary patterns for multi-view facial expression recognition
2011influential reference
Robust Facial Feature Tracking Using Shape-Constrained Multiresolution-Selected Linear Predictors
2011cited by this paper
Visual Speech Recognition Using Optical Flow and Support Vector Machines
2011cited by this paper
Viseme definitions comparison for visual-only speech recognition
2011cited by this paper
Audio visual speech recognition in noisy visual environments
2011cited by this paper
A study of influence of word lip reading by change of frame rate
2010cited by this paper
AN investigation into features for multi-view lipreading
2010cited by this paper
Improving visual features for lip-reading
2010cited by this paper
Robust facial feature tracking using selected multi-resolution linear predictors
2009cited by this paper
Comparing visual features for lipreading
2009influential reference
The challenge of multispeaker lip-reading
2008influential reference
A Continuous Speech Recognition Evaluation Protocol for the AVICAR Database
2008cited by this paper
Visual speech recognition across multiple views
2008cited by this paper
Robust Lip-Tracking using Rigid Flocks of Selected Linear Predictors
2008influential reference
Visual Speech Recognition: Lip Segmentation and Mapping
2008cited by this paper
The accents of outsourcing: the meanings of “neutral” in the Indian call centre industry
2007cited by this paper
Profile View Lip Reading
2007cited by this paper
SIFT Features Tracking for Video Stabilization
2007cited by this paper
Comparison of Phoneme and Viseme Based Acoustic Units for Speech Driven Realistic lip Animation
2007cited by this paper
Lip-reading enhancement for law enforcement
2006cited by this paper
A PCA Based Visual DCT Feature Extraction Method for Lip-Reading
2006cited by this paper
An Introduction to Language and Linguistics: Breaking the Language Spell
2006cited by this paper
Statistical Comparisons of Classifiers over Multiple Data Sets
2006cited by this paper
Learning Efficient Linear Predictors for Motion Estimation
2006cited by this paper
Lipreading Using Profile Versus Frontal Views
2006influential reference
An audio-visual corpus for speech perception and automatic speech recognition.
2006cited by this paper
Analysis of correlation between audio and visual speech features for clean audio feature prediction in noise
2006cited by this paper
The HTK book version 3.4
2006influential reference
Visual model structures and synchrony constraints for audio-visual speech recognition
2006cited by this paper
Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms
2004cited by this paper
Active Appearance Models Revisited
2004cited by this paper
Avatar-mediated face tracking and lip reading for human computer interaction
2004cited by this paper
A multi-stream audio-video large-vocabulary Mandarin Chinese speech database
2004influential reference
DBN based multi-stream models for audio-visual speech recognition
2004cited by this paper
A segment-based audio-visual speech recognizer: data collection, development, and initial experiments
2004influential reference
Accurate and quasi-automatic lip tracking
2004cited by this paper
Audio-Visual Automatic Speech Recognition: An Overview
2004cited by this paper
Audio-Visual Speech Recognition
2004influential reference
Algorithm for multiple faces tracking
2003cited by this paper
Visual speech synthesis using shape and appearance models
2003influential reference
Effects of image distortions on audio-visual speech recognition
2003cited by this paper
Similarity structure in perceptual and physical measures for visual Consonants across talkers
2002cited by this paper
Dynamic Bayesian Networks for Audio-Visual Speech Recognition
2002cited by this paper
Extraction of Visual Features for Lipreading
2002cited by this paper
CUAVE: A new audio-visual database for multimodal human-computer interface research
2002cited by this paper
DCT-based video features for audio-visual speech recognition
2002cited by this paper
Audio-to-Visual Conversion Using Hidden Markov Models
2002influential reference
Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection
2002cited by this paper
A comparison of model and transform-based visual features for audio-visual LVCSR
2001cited by this paper
Robust face tracking using color
2000influential reference
Tracking of multiple faces for human-computer interfaces and virtual environments
2000cited by this paper
Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet
2000cited by this paper
Active Appearance Models
1998cited by this paper
Effect of low frame-rate video on intelligibility of speech
1998cited by this paper
Nonlinear scale decomposition based features for visual speech recognition
1998influential reference
An image transform approach for HMM based automatic lipreading
1998cited by this paper
Audio-visual integration in multimodal communication
1998cited by this paper
Accurate, real-time, unadorned lip tracking
1998cited by this paper
Conceptual constraints in sentence-based lipreading in the hearing impaired.
1998cited by this paper
会議報告－Speechreading by Humans and Machines; Models Systems and Applications
1997cited by this paper
Speechreading using Probabilistic Models
1997cited by this paper
Multi-modal tracking of faces for video communications
1997cited by this paper
Discrete Cosine Transform
1996influential reference
Tracking faces
1996cited by this paper
Nonlinear Scale-Space from n-Dimensional Sieves
1996influential reference
Which components of the face do humans and machines best speechread
1996cited by this paper
A real-time face tracker
1996cited by this paper
Speaker identification by lipreading
1996cited by this paper
Visible Speech as a Function of Image Quality: Effects of Display Parameters on Lipreading Ability
1996cited by this paper
Perceiving Talking Faces
1995influential reference
Analysis of View Angle Used in Speechreading Training of Sentences
1995cited by this paper
Boosting and Other Ensemble Methods
1994cited by this paper
Implicit and explicit use of scripted constraints in lip-reading
1993cited by this paper
Lipreading and audio-visual speech perception.
1992cited by this paper
Use of acoustic sentence level and lexical stress in HSMM speech recognition
1992cited by this paper
Script activation in lipreading.
1991cited by this paper
Teaching lip-reading: the efficacy of lessons on video.
1989cited by this paper
Automatic lipreading by optical-flow analysis
1989cited by this paper

CITED BY

Sequential viseme-driven visual speech recognition through dual-stream interactive neural architecture.
2026cites this paper
Lipvis: A Novel Transient Viseme Extraction Framework for Lip Reading
2025cites this paper
Evaluation of end-to-end continuous spanish lipreading in different data conditions
2025cites this paper
Toward Micro Lip Reading in the Wild: Dataset, Theory, and Practices
2025cites this paper
Interpreting the Role of Visemes in Audio-Visual Speech Recognition
2025cites this paper
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
2024cites this paper
Leveraging Visemes for Better Visual Speech Representation and Lip Reading
2023cites this paper
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey
2023cites this paper
Diverse Pose Lip-Reading Framework
2022cites this paper
MuteIt
2022cites this paper
Linguistically involved data-driven approach for Malayalam phoneme-to-viseme mapping
2021cites this paper
Lip reading using external viseme decoding
2021cites this paper
An Effective Conversion of Visemes to Words for High-Performance Automatic Lipreading
2021cites this paper
Speech Recognition Using RFID Tattoos (Extended Abstract)
2021cites this paper
Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques
2020cites this paper
A Survey of Research on Lipreading Technology
2020cites this paper
Noise-Robust Speech Recognition System based on Multimodal Audio-Visual Approach Using Different Deep Learning Classification Techniques
2020cites this paper
Viseme set identification from Malayalam phonemes and allophones
2019cites this paper
Speech Recognition Using Historian Multimodal Approach
2019cites this paper
Alternative Visual Units for an Optimized Phoneme-Based Lipreading System
2019cites this paper
RFID Tattoo
2019cites this paper
THANGTHAI, BEAR, HARVEY: PHONEMES AND VISEMES WITH DNNS
2018cites this paper
Resource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion
2018cites this paper
Comparing heterogeneous visual gestures for measuring the diversity of visual speech signals
2018cites this paper
Large-Scale Visual Speech Recognition
2018cites this paper
Zero-shot keyword spotting for visual speech recognition in-the-wild
2018cites this paper
Survey on automatic lip-reading in the era of deep learning
2018cites this paper
The speaker-independent lipreading play-off; a survey of lipreading machines
2018cites this paper
Computer lipreading via hybrid deep neural network hidden Markov models
2018cites this paper
Understanding the visual speech signal
2017cites this paper
Audio and visual modality combination in speech processing applications
2017cites this paper
Lip-reading by surveillance cameras
2017cites this paper
Decoding visemes: improving machine lipreading (PhD thesis)
2017cites this paper
Improved Speech Reconstruction from Silent Video
2017cites this paper
cvpaper.challenge in 2016: Futuristic Computer Vision through 1, 600 Papers Survey
2017cites this paper
Vid2speech: Speech reconstruction from silent video
2017cites this paper
Visual gesture variability between talkers in continuous visual speech
2017cites this paper
A Framework for Speechreading Acquisition Tools
2017cites this paper
An audio-visual corpus for multimodal automatic speech recognition
2017influential citation
Comparing phonemes and visemes with DNN-based lipreading
2017cites this paper
Phoneme-to-viseme mappings: the good, the bad, and the ugly
2017cites this paper
Combining Residual Networks with LSTMs for Lipreading
2017cites this paper
A Data Driven Approach to Audiovisual Speech Mapping
2016influential citation
Interlingual Homonymy Hinders Communication when a Person Reads Foreign Words from the Lips (from the Position of a Native Russian Speaker)
year unknowncites this paper
MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable
year unknowncites this paper