Large Sign Language Models: Toward 3D American Sign Language Translation

Sen Zhang,Xiaoxiao He,Di Liu,Zhaoyang Xia,Mingyu Zhao,Chaowei Tan,Vivian Li,Bo Liu,Dimitris N. Metaxas,Mubbasir Kapadia

Published 2025 in arXiv.org

ABSTRACT

We present Large Sign Language Models (LSLM), a novel framework for translating 3D American Sign Language (ASL) by leveraging Large Language Models (LLMs) as the backbone, which can benefit hearing-impaired individuals'virtual communication. Unlike existing sign language recognition methods that rely on 2D video, our approach directly utilizes 3D sign language data to capture rich spatial, gestural, and depth information in 3D scenes. This enables more accurate and resilient translation, enhancing digital communication accessibility for the hearing-impaired community. Beyond the task of ASL translation, our work explores the integration of complex, embodied multimodal languages into the processing capabilities of LLMs, moving beyond purely text-based inputs to broaden their understanding of human communication. We investigate both direct translation from 3D gesture features to text and an instruction-guided setting where translations can be modulated by external prompts, offering greater flexibility. This work provides a foundational step toward inclusive, multimodal intelligent systems capable of understanding diverse forms of language.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-11-11
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.48550/arXiv.2511.08535 arXiv 2511.08535
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Sign Language Recognition: A Large-scale Multi-view Dataset and Comprehensive Evaluation
2025cited by this paper
Snapmoji: Instant Generation of Animatable Dual-Stylized Avatars
2025cited by this paper
LUCAS: Layered Universal Codec Avatars
2025cited by this paper
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
2025cited by this paper
Improved Training Technique for Latent Consistency Models
2025cited by this paper
T2Bs: Text-to-Character Blendshapes via Video Generation
2025cited by this paper
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing
2025cited by this paper
VISIAR: Empower MLLM for Visual Story Ideation
2025cited by this paper
Show and Segment: Universal Medical Image Segmentation via In-Context Learning
2025cited by this paper
Implicit In-context Learning
2024cited by this paper
Instantaneous Perception of Moving Objects in 3D
2024cited by this paper
LLMs are Good Sign Language Translators
2024influential reference
Isolated Video-Based Sign Language Recognition Using a Hybrid CNN-LSTM Framework Based on Attention Mechanism
2024cited by this paper
Towards Online Continuous Sign Language Recognition and Translation
2024cited by this paper
Second-Order Graph ODEs for Multi-Agent Trajectory Forecasting
2024cited by this paper
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
2024cited by this paper
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
2024cited by this paper
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
2024cited by this paper
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
2024cited by this paper
New Capability to Look Up an ASL Sign from a Video Example
2024cited by this paper
Continuous sign language recognition using intra-inter gloss attention
2024cited by this paper
Video-Based Sign Language Recognition via ResNet and LSTM Network
2024cited by this paper
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
2024cited by this paper
LEPARD: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction
2023cited by this paper
Spatial–temporal transformer for end-to-end sign language recognition
2023cited by this paper
CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment
2023cited by this paper
Steering Prototypes with Prompt-tuning for Rehearsal-free Continual Learning
2023cited by this paper
Dealing with Heterogeneous 3d Mr Knee Images: A Federated Few-Shot Learning Method with Dual Knowledge Distillation
2023cited by this paper
Deep Learning Segmentation of the Right Ventricle in Cardiac MRI: The M&Ms Challenge
2023cited by this paper
MotionGPT: Human Motion as a Foreign Language
2023influential reference
ProxEdit: Improving Tuning-Free Real Image Editing with Proximal Guidance
2023cited by this paper
DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction
2023cited by this paper
Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency
2023cited by this paper
DeFormer: Integrating Transformers with Deformable Models for 3D Shape Abstraction from a Single Image
2023cited by this paper
SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation
2023cited by this paper
Empowering Deaf-Hearing Communication: Exploring Synergies between Predictive and Generative AI-Based Strategies towards (Portuguese) Sign Language Interpretation
2023influential reference
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
2023cited by this paper
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation
2023cited by this paper
Generating Diverse and Natural 3D Human Motions from Text
2022cited by this paper
Region Proposal Rectification Towards Robust Instance Segmentation of Biological Images
2022cited by this paper
Self-Emphasizing Network for Continuous Sign Language Recognition
2022cited by this paper
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts
2022cited by this paper
DeepRecon: Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via A Structure-Specific Generative Method
2022cited by this paper
TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers
2022cited by this paper
Sprach- und Kulturforschung unter besonderen Bedingungen. Das Institut für Deutsche Gebärdensprache und Kommunikation Gehörloser
2021cited by this paper
Label super resolution for 3D magnetic resonance images using deformable U-net
2021cited by this paper
Refined Deep Layer Aggregation for Multi-Disease, Multi-View & Multi-Center Cardiac MR Segmentation
2021cited by this paper
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language
2020cited by this paper
Quantitative Survey of the State of the Art in Sign Language Recognition
2020influential reference
Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation
2020cited by this paper
AMASS: Archive of Motion Capture As Surface Shapes
2019cited by this paper
Barcelona, Spain
2019cited by this paper
Hearing loss: rising prevalence and impact
2019cited by this paper
Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective
2019cited by this paper
Expressive Body Capture: 3D Hands, Face, and Body From a Single Image
2019cited by this paper
Neural Sign Language Translation
2018cited by this paper
Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization
2017cited by this paper
Decoupled Weight Decay Regularization
2017cited by this paper
Recent Advances of Deep Learning for Sign Language Recognition
2017cited by this paper
Neural Discrete Representation Learning
2017cited by this paper
The KIT Motion-Language Dataset
2016cited by this paper
CIDEr: Consensus-based image description evaluation
2014cited by this paper
ROUGE: A Package for Automatic Evaluation of Summaries
2004cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper

CITED BY

SignX: Continuous Sign Recognition in Compact Pose-Rich Latent Space
2025cites this paper