BIM-Based Adversarial Attacks Against Speech Deepfake Detectors

Wendy Edda Wang,Davide Salvi,Viola Negroni,Daniele Ugo Leonzio,P. Bestagini,Stefano Tubaro

Published 2025 in Electronics

ABSTRACT

Automatic Speaker Verification (ASV) systems are increasingly employed to secure access to services and facilities. However, recent advances in speech deepfake generation pose serious threats to their reliability. Modern speech synthesis models can convincingly imitate a target speaker’s voice and generate realistic synthetic audio, potentially enabling unauthorized access through ASV systems. To counter these threats, forensic detectors have been developed to distinguish between real and fake speech. Although these models achieve strong performance, their deep learning nature makes them susceptible to adversarial attacks, i.e., carefully crafted, imperceptible perturbations in the audio signal that make the model unable to classify correctly. In this paper, we explore adversarial attacks targeting speech deepfake detectors. Specifically, we analyze the effectiveness of Basic Iterative Method (BIM) attacks applied in both time and frequency domains under white- and black-box conditions. Additionally, we propose an ensemble-based attack strategy designed to simultaneously target multiple detection models. This approach generates adversarial examples with balanced effectiveness across the ensemble, enhancing transferability to unseen models. Our experimental results show that, although crafting universally transferable attacks remains challenging, it is possible to fool state-of-the-art detectors using minimal, imperceptible perturbations, highlighting the need for more robust defenses in speech deepfake detection.

PUBLICATION RECORD

Publication year
2025
Venue
Electronics
Publication date
2025-07-24
Fields of study
Not labeled
Identifiers
DOI 10.3390/electronics14152967
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Artificial intelligence in the battle against disinformation and misinformation: a systematic review of challenges and approaches
2025cited by this paper
LoCal: Logical and Causal Fact-Checking with LLM-Based Multi-Agents
2025cited by this paper
Deepfake Media Forensics: Status and Future Challenges
2025cited by this paper
Adversarial Attacks on Automatic Speech Recognition (ASR): A Survey
2024cited by this paper
Explore the World of Audio Deepfakes: A Guide to Detection Techniques for Non-Experts
2024cited by this paper
A Survey on Speech Deepfake Detection
2024cited by this paper
Audio-deepfake detection: Adversarial attacks and countermeasures
2024cited by this paper
Hybrid Transformer Architectures With Diverse Audio Features for Deepfake Speech Classification
2024cited by this paper
Ensemble Adversarial Defenses and Attacks in Speaker Verification Systems
2024cited by this paper
Towards the transferable audio adversarial attack via ensemble methods
2023cited by this paper
Deepfake Audio Detection via MFCC Features Using Machine Learning
2022cited by this paper
Adversarial Attack and Defense Strategies of Speaker Recognition Systems: A Survey
2022cited by this paper
A survey on adversarial attacks and defences
2021cited by this paper
AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations
2020cited by this paper
Emotions Don't Lie: An Audio-Visual Deepfake Detection Method using Affective Cues
2020cited by this paper
Adversarial Examples for Automatic Speech Recognition: Attacks and Countermeasures
2019cited by this paper
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech
2019cited by this paper
Robustness of Adversarial Attacks in Sound Event Classification
2019cited by this paper
A Unified Approach to Interpreting Model Predictions
2017cited by this paper
SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit
2016cited by this paper
A Light CNN for Deep Face Representation With Noisy Labels
2015cited by this paper
Speaker Verification Using Adapted Gaussian Mixture Models
2000cited by this paper

CITED BY

Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
2025cites this paper
TextShelter: Text Adversarial Example Defense Based on Input Reconstruction
2025cites this paper