Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Sandipana Dowerah,Ajinkya Kulkarni,Romain Serizel,D. Jouvet

Published 2023 in Interspeech

ABSTRACT

The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step training procedure that takes the benefit of self-supervised learning. In the first stage, the Diff-Filter is trained by conducting timedomain speech filtering using a scoring-based diffusion model. In the second stage, the Diff-Filter is jointly optimized with a pre-trained ECAPA-TDNN speaker verification model under a self-supervised learning framework. We present a novel loss based on equal error rate. This loss is used to conduct selfsupervised learning on a dataset that is not labelled in terms of speakers. The proposed approach is evaluated on MultiSV, a multichannel speaker verification dataset, and shows significant improvements in performance under noisy multichannel conditions.

PUBLICATION RECORD

Publication year
2023
Venue
Interspeech
Publication date
2023-07-05
Fields of study
Computer Science, Engineering
Identifiers
DOI 10.48550/arXiv.2307.02244 arXiv 2307.02244
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Joint Optimization of Diffusion Probabilistic-Based Multichannel Speech Enhancement with Far-Field Speaker Verification
2023cited by this paper
Conditional Diffusion Probabilistic Model for Speech Enhancement
2022cited by this paper
Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning
2022cited by this paper
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction
2022cited by this paper
Universal Speech Enhancement with Score-based Diffusion
2022cited by this paper
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain
2022cited by this paper
Maximum Likelihood Training of Score-Based Diffusion Models
2021cited by this paper
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification
2021cited by this paper
Log-Likelihood-Ratio Cost Function as Objective Loss for Speaker Verification Systems
2021cited by this paper
A Study on Speech Enhancement Based on Diffusion Probabilistic Model
2021cited by this paper
Restoring degraded speech via a modified diffusion model
2021cited by this paper
A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement
2020cited by this paper
Voxceleb: Large-scale speaker verification in the wild
2020cited by this paper
Disentangled Speech Embeddings Using Cross-Modal Self-Supervision
2020cited by this paper
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
2020cited by this paper
DiffWave: A Versatile Diffusion Model for Audio Synthesis
2020cited by this paper
Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder
2020cited by this paper
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning
2020cited by this paper
A SI-SDR Loss Function based Monaural Source Separation
2020cited by this paper
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement
2019cited by this paper
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
2018cited by this paper
A VARIANCE MODELING FRAMEWORK BASED ON VARIATIONAL AUTOENCODERS FOR SPEECH ENHANCEMENT
2018cited by this paper
Time-Frequency Masking-Based Speech Enhancement Using Generative Adversarial Network
2018cited by this paper
Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition
2018cited by this paper
Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization
2017cited by this paper
SEGAN: Speech Enhancement Generative Adversarial Network
2017cited by this paper
Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
2017cited by this paper
Gaussian Error Linear Units (GELUs)
2016cited by this paper
FABIOLE, a Speech Database for Forensic Speaker Comparison
2016cited by this paper
MUSAN: A Music, Speech, and Noise Corpus
2015cited by this paper
Librispeech: An ASR corpus based on public domain audio books
2015cited by this paper
Numerical Solution of Stochastic Differential Equations
2015cited by this paper
Interpretation and Generalization of Score Matching
2009cited by this paper

CITED BY

Trainable multi-channel front-ends for joint beamforming and speaker embedding extraction
2026cites this paper
Graph-Guided Spatial-Temporal Diffusion Model for Speech Enhancement with Microphone Array
2025cites this paper
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
2024cites this paper