Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation

Published 2014 in EURASIP Journal on Audio, Speech, and Music Processing

ABSTRACT

Previously, a dereverberation method based on generalized spectral subtraction (GSS) using multi-channel least mean-squares (MCLMS) has been proposed. The results of speech recognition experiments showed that this method achieved a significant improvement over conventional methods. In this paper, we apply this method to distant-talking (far-field) speaker recognition. However, for far-field speech, the GSS-based dereverberation method using clean speech models degrades the speaker recognition performance. This may be because GSS-based dereverberation causes some distortion between clean speech and dereverberant speech. In this paper, we address this problem by training speaker models using dereverberant speech obtained by suppressing reverberation from arbitrary artificial reverberant speech. Furthermore, we propose an efficient computational method for a combination of the likelihood of dereverberant speech using multiple compensation parameter sets. This addresses the problem of determining optimal compensation parameters for GSS. We report the results of a speaker recognition experiment performed on large-scale far-field speech with different reverberant environments to the training environments. The proposed GSS-based dereverberation method achieves a recognition rate of 92.2%, which compares well with conventional cepstral mean normalization with delay-and-sum beamforming using a clean speech model (49.0%) and a reverberant speech model (88.4%). We also compare the proposed method with another dereverberation technique, multi-step linear prediction-based spectral subtraction (MSLP-GSS). The proposed method achieves a better recognition rate than the 90.6% of MSLP-GSS. The use of multiple compensation parameters further improves the speech recognition performance, giving our approach a recognition rate of 93.6%. We implement this method in a real environment using the optimal compensation parameters estimated from an artificial environment. The results show a recognition rate of 87.8% compared with 72.5% for delay-and-sum beamforming using a reverberant speech model.

PUBLICATION RECORD

Publication year
2014
Venue
EURASIP Journal on Audio, Speech, and Music Processing
Publication date
2014-04-15
Fields of study
Computer Science
Identifiers
DOI 10.1186/1687-4722-2014-15
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition
2012cited by this paper
Dereverberation and denoising based on generalized spectral subtraction by multi-channel LMS algorithm using a small-scale microphone array
2012influential reference
Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
2011cited by this paper
Front-End Factor Analysis for Speaker Verification
2011cited by this paper
Theoretical Analysis of Musical Noise in Generalized Spectral Subtraction Based on Higher Order Statistics
2011cited by this paper
Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions
2011cited by this paper
Signal-Based Performance Evaluation of Dereverberation Algorithms
2010cited by this paper
An auditory based modulation spectral feature for reverberant speech recognition
2010cited by this paper
Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction
2009cited by this paper
A Study of Interspeaker Variability in Speaker Verification
2008cited by this paper
CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments
2008cited by this paper
Blind dereverberation based on CMN and spectral subtraction by multi-channel LMS algorithm
2008cited by this paper
Far-Field Speaker Recognition
2007cited by this paper
Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM
2007cited by this paper
Precise Dereverberation Using Multichannel Linear Prediction
2007cited by this paper
Acoustic MIMO Signal Processing
2006cited by this paper
A two-stage algorithm for one-microphone reverberant speech enhancement
2006cited by this paper
Model Adaptation for Long Convolutional Distortion by Maximum Likelihood Based State Filtering Approach
2006cited by this paper
Support vector machines for speaker and language recognition
2006cited by this paper
Spectral Subtraction Steered by Multi-Step Forward Linear Prediction For Single Channel Speech Dereverberation
2006influential reference
Optimal step size of the adaptive multichannel LMS algorithm for blind SIMO identification
2005cited by this paper
An Amharic speech corpus for large vocabulary continuous speech recognition
2005cited by this paper
Multi-channel speech dereverberation based on a statistical model of late reverberation
2005cited by this paper
EURASIP Journal on Applied Signal Processing 2003:11, 1074–1090 c ○ 2003 Hindawi Publishing Corporation Subspace Methods for Multimicrophone Speech Dereverberation
2002cited by this paper
Adaptive multi-channel least mean square and Newton algorithms for blind channel identification
2002cited by this paper
Adaptive blind channel identification: Multi-channel least mean square and Newton algorithms
2002cited by this paper
Double the trouble: handling noise and reverberation in far-field automatic speech recognition
2002cited by this paper
A New Method Based on Spectral Subtraction for Speech Dereverberation
2001cited by this paper
Evaluating long-term spectral subtraction for reverberant ASR
2001cited by this paper
Speaker Verification Using Adapted Gaussian Mixture Models
2000cited by this paper
Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition
2000cited by this paper
JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research
1999cited by this paper
Performance of an HMM speech recognizer using a real-time tracking microphone array as input
1999cited by this paper
A parametric formulation of the generalized spectral subtraction method
1998influential reference
Recognizing reverberant speech with RASTA-PLP
1997cited by this paper
Speaker identification and verification using Gaussian mixture speaker models
1995cited by this paper
Speaker recognition using neural networks and conventional classifiers
1994cited by this paper
Cepstrum based deconvolution for speech dereverberation
1994cited by this paper
Stable dereverberation using microphone arrays for speaker verification
1994cited by this paper
Efficient Cepstral Normalization for Robust Speech Recognition
1993influential reference
Cepstral analysis technique for automatic speaker verification
1981cited by this paper
Suppression of acoustic noise in speech using spectral subtraction
1979cited by this paper

CITED BY

A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities
2021cites this paper
Transformation of Voice Signals to Spatial Domain for Code Optimization in Digital Image Processing
2020cites this paper
Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
2019cites this paper
Speaker identification features extraction methods: A systematic review
2017cites this paper
Multi-channel i-vector combination for robust speaker verification in multi-room domestic environments
2016cites this paper
Robust speaker recognition by means of acoustic transmission channel matching: An acoustic parameter estimation approach
2016cites this paper
Multi-channel speaker verification based on total variability modelling
2015cites this paper
Distant-talking accent recognition by combining GMM and DNN
2015cites this paper
Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition
2015cites this paper
Speech selection and environmental adaptation for asynchronous speech recognition
2015cites this paper
Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
2015cites this paper
Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker
2015cites this paper
Single-sided approach to discriminative PLDA training for text-independent speaker verification without using expanded i-vector
2014cites this paper
Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording
2014cites this paper
ARASID: Artiﬁcial Reverberation-Adjusted Indoor Speaker Identiﬁcation Dealing with Variable Distances
year unknowncites this paper