Robust Image Classifiers Fail Under Shifted Adversarial Perturbations

Published 2025 in ACM Symposium on Document Engineering

ABSTRACT

Non-robustness of image classifiers to subtle, adversarial perturbations is a well-known failure mode. Defenses against such attacks are typically evaluated by measuring the error rate on perturbed versions of the natural test set, quantifying the worst-case performance within a specified perturbation budget. However, these evaluations often isolate specific perturbation types, underestimating the adaptability of real-world adversaries who can modify or compose attacks in unforeseen ways. In this work, we show that models considered robust to strong attacks, such as AutoAttack, can be compromised by a simple modification of the weaker FGSM attack, where the adversarial perturbation is slightly transformed prior to being added to the input. Despite the attack's simplicity, robust models that perform well against standard FGSM become vulnerable to this variant. These findings suggest that current defenses may generalize poorly beyond their assumed threat models and can achieve inflated robustness scores under narrowly defined evaluation settings.

PUBLICATION RECORD

Publication year
2025
Venue
ACM Symposium on Document Engineering
Publication date
2025-08-27
Fields of study
Computer Science
Identifiers
DOI 10.1145/3704268.3742694
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Adversarial attacks in computer vision: a survey
2024cited by this paper
When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture
2022cited by this paper
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts
2021cited by this paper
Adversarial purification with Score-based generative models
2021cited by this paper
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
2020influential reference
Perceptual Adversarial Robustness: Defense Against Unseen Threat Models
2020cited by this paper
RobustBench: a standardized adversarial robustness benchmark
2020influential reference
Do Adversarially Robust ImageNet Models Transfer Better?
2020cited by this paper
Measuring Robustness to Natural Distribution Shifts in Image Classification
2020cited by this paper
Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack
2019cited by this paper
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
2019cited by this paper
Theoretically Principled Trade-off between Robustness and Accuracy
2019cited by this paper
Adversarial Training and Robustness for Multiple Perturbations
2019cited by this paper
Towards Large Yet Imperceptible Adversarial Image Perturbations With Perceptual Color Distance
2019cited by this paper
ColorFool: Semantic Adversarial Colorization
2019cited by this paper
Square Attack: a query-efficient black-box adversarial attack via random search
2019cited by this paper
Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks
2019cited by this paper
Unrestricted Adversarial Examples via Semantic Manipulation
2019cited by this paper
Spatially Transformed Adversarial Examples
2018cited by this paper
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models
2018cited by this paper
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
2018cited by this paper
SparseFool: A Few Pixels Make a Big Difference
2018cited by this paper
Disentangling Adversarial Robustness and Generalization
2018cited by this paper
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
2017cited by this paper
Automatic differentiation in PyTorch
2017cited by this paper
The Space of Transferable Adversarial Examples
2017cited by this paper
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
2016cited by this paper
Towards Evaluating the Robustness of Neural Networks
2016cited by this paper
DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Explaining and Harnessing Adversarial Examples
2014influential reference

CITED BY

No citing papers are available for this paper.