Enhancing Robustness of Autoregressive Language Models against Orthographic Attacks via Pixel-based Approach

Han Yang,Jian Lan,Yihong Liu,Hinrich Schutze,Thomas Seidl

Published 2025 in arXiv.org

ABSTRACT

Autoregressive language models are vulnerable to orthographic attacks, where input text is perturbed with characters from multilingual alphabets, leading to substantial performance degradation. This vulnerability primarily stems from the out-of-vocabulary issue inherent in subword tokenizers and their embeddings. To address this limitation, we propose a pixel-based generative language model that replaces the text-based embeddings with pixel-based representations by rendering words as individual images. This design provides stronger robustness to noisy inputs, while an extension of compatibility to multilingual text across diverse writing systems. We evaluate the proposed method on the multilingual LAMBADA dataset, WMT24 dataset and the SST-2 benchmark, demonstrating both its resilience to orthographic noise and its effectiveness in multilingual settings.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-08-28
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.48550/arXiv.2508.21206 arXiv 2508.21206
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Overcoming Vocabulary Constraints with Pixel-level Fallback
2025cited by this paper
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
2025cited by this paper
PIXAR: Auto-Regressive Language Modeling in Pixel Space
2024influential reference
Evaluating Pixel Language Models on Non-Standardized Languages
2024cited by this paper
Autoregressive Pre-Training on Pixels and Texts
2024cited by this paper
PHD: Pixel-Based Language Modeling of Historical Documents
2023cited by this paper
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
2021cited by this paper
Generative Adversarial Networks
2021cited by this paper
Robust Open-Vocabulary Translation from Visual Text Representations
2021cited by this paper
Revisiting Pre-Trained Models for Chinese Natural Language Processing
2020cited by this paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
Language Models are Unsupervised Multitask Learners
2019cited by this paper
Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems
2019cited by this paper
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
2018cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015cited by this paper
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
2015cited by this paper
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
2013cited by this paper
Backpropagation Applied to Handwritten Zip Code Recognition
1989cited by this paper

CITED BY

No citing papers are available for this paper.