Lip Enhancement and Multi-View Simulation for Robust Visual Speech Recognition in MAVSR 2025

Fei Su,Cancan Li,Juan Liu

Published 2025 in IEEE International Conference on Automatic Face & Gesture Recognition

ABSTRACT

In this paper, we present our work for Visual Speech Recognition (VSR) in the Mandarin Audio-Visual Speech Recognition (MAVSR) Challenge 2025, with a particular focus on improving lipreading under challenging visual conditions. The proposed system leverages cross-modal knowledge transfer and employs a progressive training strategy based on large-scale speech and visual speech datasets. Furthermore, we introduce LIPER, a visual enhancement module designed to generate improved lip-region visual data under conditions such as low resolution, poor illumination, and color distortion. LIPER further facilitates the synthesis of multi-view lip movements through lip pose estimation and 3D reconstruction. These enhancements significantly improve the robustness of the VSR system under low-quality visual conditions. Experimental results show that the proposed approach achieves relative character error rate (CER) reductions of 16.1% on the MOV20-Test set, compared to the official baseline system in track 1, and achieves second place among submitted systems in the challenge. The code is available at https://github.com/yaku122/RVSR.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-41 of 41 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1