Explanation-Guided Diagnosis of Machine Learning Evasion Attacks

Published 2021 in Security and Privacy in Communication Networks

ABSTRACT

Machine Learning (ML) models are susceptible to evasion attacks. Evasion accuracy is typically assessed using aggregate evasion rate, and it is an open question whether aggregate evasion rate enables feature-level diagnosis on the effect of adversarial perturbations on evasive predictions. In this paper, we introduce a novel framework that harnesses explainable ML methods to guide high-fidelity assessment of ML evasion attacks. Our framework enables explanation-guided correlation analysis between pre-evasion perturbations and post-evasion explanations. Towards systematic assessment of ML evasion attacks, we propose and evaluate a novel suite of model-agnostic metrics for sample-level and dataset-level correlation analysis. Using malware and image classifiers, we conduct comprehensive evaluations across diverse model architectures and complementary feature representations. Our explanation-guided correlation analysis reveals correlation gaps between adversarial samples and the corresponding perturbations performed on them. Using a case study on explanation-guided evasion, we show the broader usage of our methodology for assessing robustness of ML models.

PUBLICATION RECORD

Publication year
2021
Venue
Security and Privacy in Communication Networks
Publication date
2021-06-30
Fields of study
Computer Science
Identifiers
DOI 10.1007/978-3-030-90019-9_11 arXiv 2106.15820
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Practical Traffic-space Adversarial Attacks on Learning-based NIDSs
2020cited by this paper
Best-Effort Adversarial Approximation of Black-Box Malware Classifiers
2020cited by this paper
Deep Reinforcement Adversarial Learning Against Botnet Evasion Attacks
2020cited by this paper
Can We Trust Your Explanations? Sanity Checks for Interpreters in Android Malware Analysis
2020influential reference
DeepCC: a novel deep learning-based framework for cancer molecular subtype classification
2019cited by this paper
Explaining Vulnerabilities of Deep Learning to Adversarial Malware Binaries
2019cited by this paper
Evaluating Explanation Methods for Deep Learning in Security
2019influential reference
Intriguing Properties of Adversarial ML Attacks in the Problem Space
2019cited by this paper
When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures
2019cited by this paper
Fooling Neural Network Interpretations via Adversarial Model Manipulation
2019cited by this paper
Exploring Adversarial Examples in Malware Detection
2018influential reference
Local Rule-Based Explanations of Black Box Decision Systems
2018cited by this paper
Deceiving End-to-End Deep Learning Malware Detectors using Adversarial Examples
2018cited by this paper
EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models
2018influential reference
Interpretable Deep Learning under Fire
2018cited by this paper
Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning
2018cited by this paper
Anchors: High-Precision Model-Agnostic Explanations
2018cited by this paper
Bringing a GAN to a Knife-Fight: Adapting Malware Communication to Avoid Detection
2018cited by this paper
LEMNA: Explaining Deep Learning based Security Applications
2018cited by this paper
Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables
2018cited by this paper
Adversarial Deep Learning for Robust Detection of Binary Encoded Malware
2018cited by this paper
Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN
2017cited by this paper
Black-Box Attacks against RNN based Malware Detection Algorithms
2017cited by this paper
Interpretable Explanations of Black Boxes by Meaningful Perturbation
2017cited by this paper
Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers
2017cited by this paper
Learning Important Features Through Propagating Activation Differences
2017influential reference
Towards Deep Learning Models Resistant to Adversarial Attacks
2017influential reference
Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection
2017cited by this paper
SmoothGrad: removing noise by adding noise
2017cited by this paper
Deep Reinforcement Learning framework for Autonomous Driving
2017cited by this paper
A Unified Approach to Interpreting Model Predictions
2017influential reference
Interpretation of Neural Networks is Fragile
2017cited by this paper
Adversarial Examples for Malware Detection
2017influential reference
Malware Detection by Eating a Whole EXE
2017influential reference
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
2017cited by this paper
Malware Detection in Adversarial Settings: Exploiting Feature Evolutions and Confusions in Android Apps
2017cited by this paper
Adversarial Machine Learning at Scale
2016influential reference
Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
2016cited by this paper
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
2016cited by this paper
“Why Should I Trust You?”: Explaining the Predictions of Any Classifier
2016cited by this paper
Understanding Neural Networks through Representation Erasure
2016cited by this paper
Towards Evaluating the Robustness of Neural Networks
2016influential reference
Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers
2016cited by this paper
Practical Black-Box Attacks against Machine Learning
2016cited by this paper
Striving for Simplicity: The All Convolutional Net
2014cited by this paper
DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket
2014cited by this paper
Practical Evasion of a Learning-Based Classifier: A Case Study
2014cited by this paper
Explaining and Harnessing Adversarial Examples
2014influential reference
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
2013cited by this paper
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
The mnist database of handwritten digits
2005influential reference
A Value for n-person Games
1988cited by this paper
17. A Value for n-Person Games
1953cited by this paper

CITED BY

Techniques and metrics for evasion attack mitigation
2026cites this paper
DeepProv: Behavioral Characterization and Repair of Neural Networks via Inference Provenance Graph Analysis
2025influential citation
Keyed randomization with adversarial failure curves and moving target defense
2025cites this paper
Systems-Theoretic and Data-Driven Security Analysis in ML-enabled Medical Devices
2025cites this paper
Investigating adversarial attacks in software analytics via machine learning explainability
2024cites this paper
An explainable AI (XAI) model for landslide susceptibility modeling
2023cites this paper
ReinforSec: An Automatic Generator of Synthetic Malware Samples and Denial-of-Service Attacks through Reinforcement Learning
2023cites this paper
Privacy Preservation in Artificial Intelligence and Extended Reality (AI-XR) Metaverses: A Survey
2023cites this paper
Mitigating adversarial evasion attacks by deep active learning for medical image classification
2022cites this paper
The Role of Machine Learning in Cybersecurity
2022cites this paper
Sensitivity of Machine Learning Approaches to Fake and Untrusted Data in Healthcare Domain
2022cites this paper
Concept-based Adversarial Attacks: Tricking Humans and Classifiers Alike
2022cites this paper
amsqr at MLSEC-2021: Thwarting Adversarial Malware Evasion with a Defense-in-Depth
2021cites this paper
EG-Booster: Explanation-Guided Booster of ML Evasion Attacks
2021cites this paper