Towards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task Learning

Jaebok Kim,G. Englebienne,K. Truong,V. Evers

Published 2017 in Interspeech

ABSTRACT

One of the challenges in Speech Emotion Recognition (SER) "in the wild" is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and various cross-corpus classification experiments that simulate conditions "in the wild". In comparison to Single-Task Learning (STL) based state of the art methods, we found that our MTL method proposed improved performance significantly. Particularly, models using both gender and naturalness achieved more gains than those using either gender or naturalness separately. This benefit was also found in the high-level representations of the feature space, obtained from our method proposed, where discriminative emotional clusters could be observed.

PUBLICATION RECORD

Publication year
2017
Venue
Interspeech
Publication date
2017-08-13
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.21437/Interspeech.2017-736 arXiv 1708.03920
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Nonparametric Statistical Inference
2020influential reference
Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network
2016cited by this paper
Deep Cross Residual Learning for Multitask Visual Recognition
2016cited by this paper
Cross-corpus acoustic emotion recognition from singing and speaking: A multi-task learning approach
2016cited by this paper
Learning Representations of Affect from Speech
2015cited by this paper
High-level feature representation using recurrent neural network for speech emotion recognition
2015influential reference
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
Linked Source and Target Domain Subspace Feature Transfer Learning -- Exemplified by Speech Emotion Recognition
2014cited by this paper
Speech emotion recognition using deep neural network and extreme learning machine
2014influential reference
Adam: A Method for Stochastic Optimization
2014cited by this paper
Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition
2014cited by this paper
Facial Landmark Detection by Deep Multi-task Learning
2014cited by this paper
Multi-task learning in deep neural networks for improved phoneme recognition
2013cited by this paper
Acoustic Modeling Using Deep Belief Networks
2012cited by this paper
A multitask approach to continuous five-dimensional affect sensing in natural speech
2012cited by this paper
Representation Learning: A Review and New Perspectives
2012cited by this paper
Selecting Training Data for Cross-Corpus Speech Emotion Recognition: Prototypicality vs. Generalization
2011cited by this paper
Unsupervised learning in cross-corpus acoustic emotion recognition
2011cited by this paper
Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies
2010influential reference
IEMOCAP: interactive emotional dyadic motion capture database
2008cited by this paper
Visualizing Data using t-SNE
2008cited by this paper
Emotional speech recognition: Resources, features, and methods
2006cited by this paper
The eNTERFACE'05 Audio-Visual Emotion Database
2006cited by this paper
A database of German emotional speech
2005cited by this paper
Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition
2005cited by this paper
“You Stupid Tin Box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus
2004cited by this paper
A Model of Inductive Bias Learning
2000cited by this paper
Automatic early stopping using cross validation: quantifying the criteria
1998cited by this paper
Multitask Learning
1997influential reference
A solvable connectionist model of immediate recall of ordered lists
1994cited by this paper
Multitask Learning: A Knowledge-Based Source of Inductive Bias
1993cited by this paper
Gender differences in emotional development: A review of theories and research
1985cited by this paper

CITED BY

Wav2TP: A novel speech emotion recognition model using temporal pooling over transformer-based Wav2Vec2 embeddings
2026cites this paper
Multitask Transformer for Cross-Corpus Speech Emotion Recognition
2025cites this paper
Towards Cross-Task Suicide Risk Detection via Speech LLM
2025cites this paper
Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition
2025influential citation
Cohort-Sensitive Labeling: An Effective Approach for Enhancing ASR Performance
2025cites this paper
Pareto Set Learning for Multi-Objective Reinforcement Learning
2025cites this paper
Improving Domain Generalization in Speech Emotion Recognition with Whisper
2024cites this paper
Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability
2024cites this paper
GMP-TL: Gender-Augmented Multi-Scale Pseudo-Label Enhanced Transfer Learning For Speech Emotion Recognition
2024cites this paper
Linguistic based emotion analysis using softmax over time attention mechanism
2024cites this paper
Identity, Gender, Age, and Emotion Recognition from Speaker Voice with Multi-task Deep Networks for Cognitive Robotics
2024cites this paper
An Enroll-to-Verify Approach for Cross-Task Unseen Emotion Class Recognition
2023cites this paper
Bulletin of Electrical Engineering and Informatics
2023cites this paper
Elicitation-Based Curriculum Learning for Improving Speech Emotion Recognition
2023cites this paper
Improving Speech Emotion Recognition with Data Expression Aware Multi-Task Learning
2023cites this paper
Episodic Memory For Domain-Adaptable, Robust Speech Emotion Recognition
2023cites this paper
On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
2023cites this paper
Speech Emotion Recognition Using Attention Model
2023cites this paper
Speech Emotion Recognition Using Energies in six bands and Multilayer Perceptron on RAVDESS Dataset
2022influential citation
Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
2022cites this paper
Multi-task learning from Unlabelled Data to Improve Cross Language Speech Emotion Recognition
2022cites this paper
Speaker Characterization by means of Attention Pooling
2022cites this paper
Refined Feature Vectors for Human Emotion Classifier by combining multiple learning strategies with Recurrent Neural Networks
2022cites this paper
Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
2022cites this paper
Multi-Corpus Speech Emotion Recognition for Unseen Corpus Using Corpus-Wise Weights in Classification Loss
2022cites this paper
State & Trait Measurement from Nonverbal Vocalizations: A Multi-Task Joint Learning Approach
2022cites this paper
Cross-Language Speech Emotion Recognition Using Bag-of-Word Representations, Domain Adaptation, and Data Augmentation
2022cites this paper
ScSer: Supervised Contrastive Learning for Speech Emotion Recognition using Transformers
2022cites this paper
EnsembleWave: An ensembled approach for Automatic Speech Emotion Recognition
2022cites this paper
Lipopolysaccharide-Induced Model of Neuroinflammation: Mechanisms of Action, Research Application and Future Directions for Its Use
2022cites this paper
Deep Learning for Audio Visual Emotion Recognition
2022cites this paper
Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
2022cites this paper
Semi-supervised cross-lingual speech emotion recognition
2022cites this paper
Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition
2022cites this paper
Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
2022cites this paper
Does your robot know? Enhancing children's information retrieval through spoken conversation with responsible robots
2021cites this paper
Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models
2021cites this paper
Disentanglement for Audio-Visual Emotion Recognition Using Multitask Setup
2021cites this paper
EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition
2021cites this paper
Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition
2021cites this paper
Speaking of Trust - Speech as a Measure of Trust
2021cites this paper
Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
2021influential citation
Unsupervised Cross-Lingual Speech Emotion Recognition Using Pseudo Multilabel
2021cites this paper
Dual Multi-Task Network with Bridge-Temporal-Attention for Student Emotion Recognition via Classroom Video
2021cites this paper
Survey of Deep Representation Learning for Speech Emotion Recognition
2021cites this paper
Privacy and Utility Preserving Data Transformation for Speech Emotion Recognition
2021cites this paper
Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition
2021cites this paper
An Ensemble 1D-CNN-LSTM-GRU Model with Data Augmentation for Speech Emotion Recognition
2021cites this paper
Filters Know How You Feel: Explaining Intermediate Speech Emotion Classification Representations
2021cites this paper
Recognition of Cross-Language Acoustic Emotional Valence Using Stacked Ensemble Learning
2020cites this paper
Study the Influence of Gender and Age in Recognition of Emotions from Algerian Dialect Speech
2020cites this paper
Towards affect‐aware vehicles for increasing safety and comfort: recognising driver emotions from audio recordings in a realistic driving study
2020cites this paper
End-To-End Speech Emotion Recognition Based on Time and Frequency Information Using Deep Neural Networks
2020cites this paper
Meta-transfer learning for emotion recognition
2020cites this paper
Model Smoothing using Virtual Adversarial Training for Speech Emotion Estimation using Spontaneity
2020cites this paper
Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition
2020cites this paper
Speech Emotion Recognition 'in the Wild' Using an Autoencoder
2020influential citation
Towards Understanding Attention-Based Speech Recognition Models
2020cites this paper
Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers
2020cites this paper
Improving communication skills of children with autism through support of applied behavioral analysis treatments using multimedia computing: a survey
2020cites this paper
Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends
2020cites this paper
Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition
2019cites this paper
Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)
2019cites this paper
Semi-Supervised Speech Emotion Recognition With Ladder Networks
2019cites this paper
Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion
2019cites this paper
A Cross-Corpus Study on Speech Emotion Recognition
2019cites this paper
Holistic Affect Recognition Using PaNDA: Paralinguistic Non-Metric Dimensional Analysis
2019cites this paper
Machine Learning Methods for Quantification of Depression Severity and Prediction of Recovery Trajectory using Longitudinal Video and Audio Data, with Applications to Deep Brain Stimulation Treatment Optimization By
2019cites this paper
Emotion recognition from audio, dimensional and discrete categorization using CNNs
2019cites this paper
Model Smoothing Using Virtual Adversarial Training for Speech Emotion Estimation
2019cites this paper
Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
2019cites this paper
Pareto Multi-Task Learning
2019cites this paper
Analysis of Deep Learning Architectures for Cross-Corpus Speech Emotion Recognition
2019cites this paper
Speech Emotion Recognition Based on Multi-Task Learning
2019cites this paper
Speech emotion recognition based on hierarchical attributes using feature nets
2019influential citation
Cross-lingual Speech Emotion Recognition through Factor Analysis
2018cites this paper
Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes
2018cites this paper
Sharelatex Example
2018cites this paper
End-to-end Multimodal Emotion and Gender Recognition with Dynamic Weights of Joint Loss
2018cites this paper
Multi-Modal Multi-Task Deep Learning For Speaker And Emotion Recognition Of TV-Series Data
2018cites this paper
Multimodal Speech Emotion Recognition Using Audio and Text
2018cites this paper
End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights
2018cites this paper
Predicting Categorical Emotions by Jointly Learning Primary and Secondary Emotions through Multitask Learning
2018cites this paper
ASR-based Features for Emotion Recognition: A Transfer Learning Approach
2018cites this paper
Transferring Age and Gender Attributes for Dimensional Emotion Prediction from Big Speech Data Using Hierarchical Deep Learning
2018influential citation
End-to-End Listening Agent for Audiovisual Emotional and Naturalistic Interactions
2018cites this paper
Speech Emotion Recognition via Contrastive Loss under Siamese Networks
2018cites this paper
Learning Spontaneity to Improve Emotion Recognition In Speech
2017cites this paper
Planning Based System for Child-Robot Interaction in Dynamic Play Environments
2017cites this paper
Deep Temporal Models using Identity Skip-Connections for Speech Emotion Recognition
2017cites this paper
Learning spectro-temporal features with 3D CNNs for speech emotion recognition
2017cites this paper