Effective communication is a fundamental human right; yet, significant barriers persist for deaf and non-verbal individuals. This paper introduces a novel multimodal AI framework for real-time American Sign Language (ASL) translation, aiming to bridge this gap. Our system pioneers the fusion of two distinct data streams: computer vision for recognising hand shapes and movements, and surface electromyography (sEMG) for capturing the underlying muscle-level gesture intent. This dual-modal approach provides unprecedented robustness, overcoming common limitations of vision-only systems (e.g., occlusion, poor lighting) and sEMG-only systems (e.g., sensor drift). We propose a novel attention-based fusion network that intelligently integrates these data streams. Evaluated on a comprehensive dataset, our multi-modal framework achieves a 94.5 % accuracy, significantly outperforming single-modality baselines. This work details the system architecture, data fusion strategy, and real-world deployment challenges, offering a viable pathway toward creating more inclusive and effective assistive communication technologies.
A Multi-Modal AI Framework for Real-Time American Sign Language Translation
Rohan P. Nandanwar,Diksha Nishane,Himanshu Chamatkar,Om Nishane,P. P. Zode,Lakshmi Madireddy
Published 2025 in 2025 3rd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIHEI)
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
2025 3rd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIHEI)
- Publication date
2025-11-28
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-31 of 31 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1