Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Steffen Eger,Gözde Gül Şahin,Andreas Rücklé,Ji-Ung Lee,Claudia Schulz,Mohsen Mesgar,Krishnkant Swarnkar,Edwin Simpson,Iryna Gurevych

Published 2019 in North American Chapter of the Association for Computational Linguistics

ABSTRACT

Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., “!d10t”) or as a writing style (“1337” in “leet speak”), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual perturbations demonstrate. We investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82%. We then explore three shielding methods—visual character embeddings, adversarial training, and rule-based recovery—which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

PUBLICATION RECORD

Publication year
2019
Venue
North American Chapter of the Association for Computational Linguistics
Publication date
2019-02-25
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/N19-1165 arXiv 1903.11508
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
2018cited by this paper
Generating Natural Language Adversarial Examples
2018cited by this paper
Shielding Google's language toxicity model against adversarial attacks
2018cited by this paper
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
2018influential reference
Strong Baselines for Neural Semi-Supervised Learning under Domain Shift
2018cited by this paper
Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks
2018influential reference
Semantically Equivalent Adversarial Rules for Debugging NLP models
2018cited by this paper
Deep Contextualized Word Representations
2018influential reference
HotFlip: White-Box Adversarial Examples for Text Classification
2017influential reference
Synthetic and Natural Noise Both Break Neural Machine Translation
2017cited by this paper
Adversarial Examples for Evaluating Reading Comprehension Systems
2017cited by this paper
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning
2017cited by this paper
Deceiving Google's Perspective API Built for Detecting Toxic Comments
2017cited by this paper
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
2017influential reference
Learning Character-level Compositionality with Visual Features
2017cited by this paper
Glyph-aware Embedding of Chinese Characters
2017cited by this paper
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
2016cited by this paper
Improving the Robustness of Deep Neural Networks via Stability Training
2016cited by this paper
Conference on Empirical Methods in Natural Language Processing EMNLP 2016
2016cited by this paper
Towards Evaluating the Robustness of Neural Networks
2016cited by this paper
Document classification through image-based character embedding and wildcard training
2016cited by this paper
Incorporating Nesterov Momentum into Adam
2016cited by this paper
Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks
2016cited by this paper
Dependency Based Embeddings for Sentence Classification Tasks
2016cited by this paper
Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions
2015cited by this paper
Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks
2015cited by this paper
The Stanford CoreNLP Natural Language Processing Toolkit
2014cited by this paper
Explaining and Harnessing Adversarial Examples
2014influential reference
One billion word benchmark for measuring progress in statistical language modeling
2013cited by this paper
Intriguing properties of neural networks
2013cited by this paper
Efficient Higher-Order CRFs for Morphological Tagging
2013influential reference
Vine Pruning for Efficient Multi-Pass Dependency Parsing
2012cited by this paper
Robust LTS rules with the Combilex speech technology lexicon
2009cited by this paper
Introduction to the CoNLL-2000 Shared Task Chunking
2000cited by this paper

CITED BY

Overcoming Black-box Attack Inefficiency with Hybrid and Dynamic Select Algorithms
2025cites this paper
Leveraging Pre-Trained Language Models for Realistic Adversarial Attacks
2025cites this paper
Contextual Adversarial Triggers with Masked Language Models
2025cites this paper
DeepSculpt Attack: Word-Level Adversarial Perturbations on Clinical Text in Electronic Health Records (EHR) Systems
2025cites this paper
ISA: Test Case Generation Based on Improved Simulated Annealing Algorithm
2025cites this paper
Adversarial Attacks and Defenses on Large Language Models: A Systematic Review
2025cites this paper
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
2025cites this paper
From Identification to Obfuscation: A Survey of Cross-Network Mapping and Anti-Mapping Methods
2025cites this paper
AI Kill Switch for malicious web-based LLM agent
2025cites this paper
GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs
2025influential citation
See the Text: From Tokenization to Visual Reading
2025cites this paper
Investigation of Toxicity Detection Models
2025cites this paper
RedHerring Attack: Testing the Reliability of Attack Detection
2025cites this paper
Interpreting Deep Neural Networks via Relative Activation-Deactivation Abstractions
2025cites this paper
OCR-Assisted Masked BERT for Homoglyph Restoration towards Multiple Phishing Text Downstream Tasks
2025cites this paper
Advancing text adversarial example generation using large language models
2025cites this paper
Enhancing Robustness of Autoregressive Language Models against Orthographic Attacks via Pixel-based Approach
2025cites this paper
Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation
2025cites this paper
FactEval: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language Models
2025cites this paper
Tokenization is Sensitive to Language Variation
2025cites this paper
BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks
2025cites this paper
Confidence Elicitation: A New Attack Vector for Large Language Models
2025cites this paper
Close or Cloze? Assessing the Robustness of Large Language Models to Adversarial Perturbations via Word Recovery
2025influential citation
Task-Oriented Adversarial Attacks for Aspect-Based Sentiment Analysis Models
2025cites this paper
Textual variations in social media text processing applications: challenges, solutions, and trends
2025cites this paper
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks
2025cites this paper
Investigating the Impact of Model Instability on Explanations and Uncertainty
2024cites this paper
Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks
2024cites this paper
Benchmarking Large Multimodal Models against Common Corruptions
2024cites this paper
PIXAR: Auto-Regressive Language Modeling in Pixel Space
2024cites this paper
Quantum theory-inspired inter-sentence semantic interaction model for textual adversarial defense
2024cites this paper
BinarySelect to Improve Accessibility of Black-Box Attack Research
2024cites this paper
Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages
2024cites this paper
TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity
2024cites this paper
Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems
2024cites this paper
Study on relationship between adversarial texts and language errors: a human-computer interaction perspective
2024cites this paper
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
2024cites this paper
VertAttack: Taking Advantage of Text Classifiers’ Horizontal Vision
2024cites this paper
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach
2024cites this paper
Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models
2024cites this paper
IAE: Irony-Based Adversarial Examples for Sentiment Analysis Systems
2024cites this paper
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
2024cites this paper
An Adversarial Text Generation Framework Based on Multi-View and Extended Semantic Space
2024cites this paper
A Survey of Adversarial Attacks: An Open Issue for Deep Learning Sentiment Analysis Models
2024cites this paper
PAD: A Robustness Enhancement Ensemble Method via Promoting Attention Diversity
2024influential citation
Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script
2024cites this paper
ORTicket: Let One Robust BERT Ticket Transfer across Different Tasks
2024cites this paper
SOBR: A Corpus for Stylometry, Obfuscation, and Bias on Reddit
2024cites this paper
Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model
2024cites this paper
Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations
2024cites this paper
Towards Action Hijacking of Large Language Model-based Agent
2024cites this paper
On the Validity of Traditional Vulnerability Scoring Systems for Adversarial Attacks Against LLMs
2024cites this paper
VertAttack: Taking advantage of Text Classifiers' horizontal vision
2024cites this paper
SemRoDe: Macro Adversarial Training to Learn Representations that are Robust to Word-Level Attacks
2024cites this paper
Defense against adversarial attacks: robust and efficient compressed optimized neural networks
2024cites this paper
Robust Neural Machine Translation for Abugidas by Glyph Perturbation
2024influential citation
The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
2024cites this paper
Fooling the Textual Fooler via Randomizing Latent Representations
2023cites this paper
Adversarial NLP for Social Network Applications: Attacks, Defenses, and Research Directions
2023cites this paper
CrypText: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild
2023influential citation
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
2023cites this paper
TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification
2023cites this paper
MTTM: Metamorphic Testing for Textual Content Moderation Software
2023cites this paper
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
2023cites this paper
Learning the Legibility of Visual Text Perturbations
2023influential citation
Verifying the robustness of automatic credibility assessment
2023cites this paper
NoisyHate: Benchmarking Content Moderation Machine Learning Models with Human-Written Perturbations Online
2023cites this paper
No more Reviewer #2: Subverting Automatic Paper-Reviewer Assignment using Adversarial Learning
2023cites this paper
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
2023influential citation
Masked Language Model Based Textual Adversarial Example Detection
2023cites this paper
Additive Feature Attribution Explainable Methods to Craft Adversarial Attacks for Text Classification and Text Regression
2023cites this paper
Using Punctuation as an Adversarial Attack on Deep Learning-Based NLP Systems: An Empirical Study
2023cites this paper
Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility
2023cites this paper
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
2023cites this paper
Generation-based parallel particle swarm optimization for adversarial text attacks
2023cites this paper
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks
2023influential citation
SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification
2023cites this paper
RMLM: A Flexible Defense Framework for Proactively Mitigating Word-level Adversarial Attacks
2023cites this paper
Multi-level Adversarial Training for Stock Sentiment Prediction
2023cites this paper
Hiding Backdoors within Event Sequence Data via Poisoning Attacks
2023influential citation
LEAP: Efficient and Automated Test Method for NLP Software
2023cites this paper
An Efficient Character-Level Adversarial Attack Inspired by Textual Variations in Online Social Media Platforms
2023cites this paper
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution
2023cites this paper
The Trickle-down Impact of Reward (In-)consistency on RLHF
2023cites this paper
BODEGA: Benchmark for Adversarial Example Generation in Credibility Assessment
2023cites this paper
CT-GAT: Cross-Task Generative Adversarial Attack based on Transferability
2023cites this paper
Defense against adversarial attacks via textual embeddings based on semantic associative field
2023cites this paper
Formalizing Robustness Against Character-Level Perturbations for Neural Network Language Models
2023cites this paper
Generating Valid and Natural Adversarial Examples with Large Language Models
2023cites this paper
SenTest: Evaluating Robustness of Sentence Encoders
2023cites this paper
Testing Coreference Resolution Systems without Labeled Test Sets
2023cites this paper
RNNS: Representation Nearest Neighbor Search Black-Box Attack on Code Models
2023cites this paper
RobustQA: A Framework for Adversarial Text Generation Analysis on Question Answering Systems
2023cites this paper
Adversarial Text Generation by Search and Learning
2023cites this paper
Adversarial Text Perturbation Generation and Analysis
2023cites this paper
A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models
2023cites this paper
An adversarial text generation method at multiple granularity
2023cites this paper
Identifying Adversarial Attacks on Text Classifiers
2022influential citation
Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations
2022cites this paper
Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions
2022cites this paper