Deep joint learning for language recognition

Published 2021 in Neural Networks

ABSTRACT

Deep learning methods for language recognition have achieved promising performance. However, most of the studies focus on frameworks for single types of acoustic features and single tasks. In this paper, we propose the deep joint learning strategies based on the Multi-Feature (MF) and Multi-Task (MT) models. First, we investigate the efficiency of integrating multiple acoustic features and explore two kinds of training constraints, one is introducing auxiliary classification constraints with adaptive weights for loss functions in feature encoder sub-networks, and the other option is introducing the Canonical Correlation Analysis (CCA) constraint to maximize the correlation of different feature representations. Correlated speech tasks, such as phoneme recognition, are applied as auxiliary tasks in order to learn related information to enhance the performance of language recognition. We analyze phoneme-aware information from different learning strategies, like joint learning on the frame-level, adversarial learning on the segment-level, and the combination mode. In addition, we present the Language-Phoneme embedding extraction structure to learn and extract language and phoneme embedding representations simultaneously. We demonstrate the effectiveness of the proposed approaches with experiments on the Oriental Language Recognition (OLR) data sets. Experimental results indicate that joint learning on the multi-feature and multi-task models extracts instinct feature representations for language identities and improves the performance, especially in complex challenges, such as cross-channel or open-set conditions.

PUBLICATION RECORD

Publication year
2021
Venue
Neural Networks
Publication date
2021-03-26
Fields of study
Medicine, Linguistics, Computer Science
Identifiers
DOI 10.1016/j.neunet.2021.03.026 PMID 33866304
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

AP20-OLR Challenge: Three Tasks and Their Baselines
2020cited by this paper
Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings
2020cited by this paper
Speaker Embedding Extraction with Multi-feature Integration Structure
2019cited by this paper
AP19-OLR Challenge: Three Tasks and Their Baselines
2019cited by this paper
Phone-Aware Multi-task Learning and Length Expanding for Short-Duration Language Recognition
2019cited by this paper
State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18
2019cited by this paper
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction
2019cited by this paper
A Robust Text-independent Speaker Verification Method Based on Speech Separation and Deep Speaker
2019cited by this paper
Performance Analysis of the 2017 NIST Language Recognition Evaluation
2018cited by this paper
Speaker Embedding Extraction with Phonetic Information
2018cited by this paper
X-Vectors: Robust DNN Embeddings for Speaker Recognition
2018cited by this paper
Phonetic Temporal Neural Model for Language Identification
2017cited by this paper
Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks
2017cited by this paper
AP17-OLR challenge: Data, plan, and baseline
2017cited by this paper
Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
2017cited by this paper
Automatic differentiation in PyTorch
2017cited by this paper
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification
2017cited by this paper
AP16-OL7: A multilingual database for oriental languages and a language recognition baseline
2016cited by this paper
Investigation of Senone-based Long-Short Term Memory RNNs for Spoken Language Recognition
2016cited by this paper
Deep Neural Network Approaches to Speaker and Language Recognition
2015cited by this paper
Librispeech: An ASR corpus based on public domain audio books
2015cited by this paper
End-to-end text-dependent speaker verification
2015cited by this paper
A novel scheme for speaker recognition using a phonetically-aware deep neural network
2014cited by this paper
Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition
2014cited by this paper
TANDEM-bottleneck feature combination using hierarchical Deep Neural Networks
2014cited by this paper
Automatic language identification using deep neural networks
2014cited by this paper
Deep Canonical Correlation Analysis
2013cited by this paper
Speaker adaptation of neural network acoustic models using i-vectors
2013cited by this paper
Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers
2013cited by this paper
The Albayzin 2012 Language Recognition Evaluation Plan ( Albayzin 2012 LRE )
2012cited by this paper
Front-End Factor Analysis for Speaker Verification
2011cited by this paper
Language Recognition via i-vectors and Dimensionality Reduction
2011cited by this paper
Language Recognition in iVectors Space
2011cited by this paper
The Kaldi Speech Recognition Toolkit
2011cited by this paper
Multimodal Deep Learning
2011cited by this paper
Loquendo-Politecnico di Torino system for the 2009 NIST Language Recognition Evaluation
2010cited by this paper
Multi-feature combination for speaker recognition
2010cited by this paper
Unsupervised analysis of fMRI data using kernel canonical correlation
2007cited by this paper
Combining evidence from residual phase and MFCC features for speaker recognition
2006cited by this paper
Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis
2002cited by this paper

CITED BY

NanoSSL: attention mechanism-based self-supervised learning method for protein identification using nanopores
2025cites this paper
PLDE: A lightweight pooling layer for spoken language recognition
2024cites this paper
Deep temporal representation learning for language identification
2024cites this paper
Deep joint learning valuation of Bermudan swaptions
2024cites this paper
Front-face excitation-emission matrix fluorescence spectroscopy combined with interpretable deep learning for the rapid identification of the storage year of Ningxia wolfberry.
2023cites this paper
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization
2023cites this paper
Three-stage training and orthogonality regularization for spoken language recognition
2023influential citation
Common latent representation learning for low-resourced spoken language identification
2023cites this paper
Deep neural network learning biological condition information refines gene-expression-based cell subtypes
2023cites this paper
Multi-domain Attention Fusion Network For Language Recognition
2022cites this paper
Enhancing the Generalization Performance of Few-Shot Image Classification with Self-Knowledge Distillation
2022cites this paper
DEFEAT: Decoupled feature attack across deep neural networks
2022cites this paper
Additive Phoneme-aware Margin Softmax Loss for Language Recognition
2021cites this paper
Modeling and Training Strategies for Language Recognition Systems
2021cites this paper
Phoneme-aware and Channel-wise Attentive Learning for Text DependentSpeaker Verification
2021cites this paper
Special Issue on Advances in Deep Learning Based Speech Processing
year unknowncites this paper