Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks

Tong Wang,Yuan Yao,Feng Xu,Miao Xu,Shengwei An,Ting Wang

Published 2024 in AAAI Conference on Artificial Intelligence

ABSTRACT

Backdoor attacks have been shown to be a serious security threat against deep learning models, and various defenses have been proposed to detect whether a model is backdoored or not. However, as indicated by a recent black-box attack, existing defenses can be easily bypassed by implanting the backdoor in the frequency domain. To this end, we propose a new defense DTInspector against black-box backdoor attacks, based on a new observation related to the prediction confidence of learning models. That is, to achieve a high attack success rate with a small amount of poisoned data, backdoor attacks usually render a model exhibiting statistically higher prediction confidences on the poisoned samples. We provide both theoretical and empirical evidence for the generality of this observation. DTInspector then carefully examines the prediction confidences of data samples, and decides the existence of backdoor using the shortcut nature of backdoor triggers. Extensive evaluations on six backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.

PUBLICATION RECORD

Publication year
2024
Venue
AAAI Conference on Artificial Intelligence
Publication date
2024-03-24
Fields of study
Computer Science
Identifiers
DOI 10.1609/aaai.v38i1.27780
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

An Invisible Black-box Backdoor Attack through Frequency Domain
2022influential reference
Backdoor Defense via Decoupling the Training Process
2022cited by this paper
Anti-Backdoor Learning: Training Clean Models on Poisoned Data
2021cited by this paper
Backdoor Scanning for Deep Neural Networks through K-Arm Optimization
2021cited by this paper
Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch
2021cited by this paper
SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics
2021cited by this paper
RABA: A Robust Avatar Backdoor Attack on Deep Neural Network
2021cited by this paper
Adversarial Neuron Pruning Purifies Backdoored Deep Models
2021cited by this paper
Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review
2020cited by this paper
Towards Inspecting and Eliminating Trojan Backdoors in Deep Neural Networks
2020cited by this paper
Input-Aware Dynamic Backdoor Attack
2020cited by this paper
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
2020cited by this paper
Backdoor Attacks Against Deep Learning Systems in the Physical World
2020cited by this paper
Invisible Backdoor Attack with Sample-Specific Triggers
2020cited by this paper
Clean-Label Backdoor Attacks on Video Recognition Models
2020cited by this paper
Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness
2020cited by this paper
Shortcut learning in deep neural networks
2020cited by this paper
Live Trojan Attacks on Deep Neural Networks
2020cited by this paper
Dynamic Backdoor Attacks Against Machine Learning Models
2020cited by this paper
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks
2020cited by this paper
A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models
2019cited by this paper
Certified Adversarial Robustness via Randomized Smoothing
2019cited by this paper
STRIP: a defence against trojan attacks on deep neural networks
2019influential reference
Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
2019influential reference
NIC: Detecting Adversarial Samples with Neural Network Invariant Checking
2019cited by this paper
Bypassing Backdoor Detection Algorithms in Deep Learning
2019cited by this paper
Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs
2019influential reference
Model Agnostic Defence Against Backdoor Attacks in Machine Learning
2019cited by this paper
DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks
2019cited by this paper
Latent Backdoor Attacks on Deep Neural Networks
2019cited by this paper
Hidden Trigger Backdoor Attacks
2019cited by this paper
Detecting AI Trojans Using Meta Neural Analysis
2019influential reference
ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation
2019influential reference
Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems
2019cited by this paper
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
2018influential reference
Trojaning Attack on Neural Networks
2018cited by this paper
Spectral Signatures in Backdoor Attacks
2018cited by this paper
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
2017cited by this paper
Universal Adversarial Perturbations
2016cited by this paper
The German Traffic Sign Recognition Benchmark: A multi-class classification competition
2011cited by this paper
Attribute and simile classifiers for face verification
2009influential reference
Learning Multiple Layers of Features from Tiny Images
2009influential reference
ImageNet: A large-scale hierarchical image database
2009cited by this paper

CITED BY

The answer lies within: Detecting Trojans from DNNs' inherent characteristics
2026cites this paper
MCL++ : Strengthening Defenses Against Backdoor Poisoning Attacks
2025cites this paper
Exploring Graph Neural Backdoors in Vehicular Networks: Fundamentals, Methodologies, Applications, and Future Perspectives
2025cites this paper
Clean-label backdoor attack via sample-customized feature alignment
2025cites this paper
Perturbation distillation and backdoor feature induction for universal defense in deep vision models
2025cites this paper
P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs
2025cites this paper
Backdoor Attack and Defense on Deep Learning: A Survey
2025cites this paper
Distill To Detect: Amplifying Anomalies in Backdoor Models through Knowledge Distillation
2025cites this paper
CBPF: A Novel Method for Filtering Poisoned Data Based on Composite Backdoor Attacks
2025cites this paper
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
2025cites this paper
Krait: A Backdoor Attack Against Graph Prompt Tuning
2024cites this paper
Need for Speed: Taming Backdoor Attacks with Speed and Precision
2024influential citation
CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack
2024cites this paper
Graph Neural Backdoor: Fundamentals, Methodologies, Applications, and Future Directions
2024cites this paper