Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex

Enrique Sánchez-Lozano,Paula Lopez-Otero,Laura Docío Fernández,Enrique Argones-Rúa,J. Alba-Castro

Published 2013 in AVEC@ACM Multimedia

ABSTRACT

Predicting human emotions is catching the attention of many research areas, which demand accurate predictions in uncontrolled scenarios. Despite this attractiveness, designed systems for emotion detection are far off being as accurate as desired. Two of the typical measurements in human emotions are described in terms of the dimensions valence and arousal, which shape the Russell's circumplex where complex emotions lie. Thus, the Affect Recognition Sub-Challenge (ASC) of the third AudioVisual Emotion and Depression Challenge, AVEC'13, is focused on estimating these two dimensions. This paper presents a three-level fusion system combining single regression results from audio and visual features, in order to maximize the mean average correlation on both dimensions. Five sets of features are extracted (three for audio and two for video), and they are merged following an iterative process. Results show how this fusion outperforms the baseline method for the challenge database.

PUBLICATION RECORD

Publication year
2013
Venue
AVEC@ACM Multimedia
Publication date
2013-10-21
Fields of study
Computer Science, Psychology
Identifiers
DOI 10.1145/2512530.2512534
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework
2013cited by this paper
The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism
2013cited by this paper
AVEC 2013: the continuous audio/visual emotion and depression recognition challenge
2013influential reference
Local Gabor Binary Patterns from Three Orthogonal Planes for Automatic Facial Expression Recognition
2013cited by this paper
AVEC 2012: the continuous audio/visual emotion challenge
2012cited by this paper
Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering
2012cited by this paper
The INTERSPEECH 2012 Speaker Trait Challenge
2012cited by this paper
Robust continuous prediction of human emotions using multiscale dynamic cues
2012influential reference
Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning
2012cited by this paper
The first facial expression recognition and analysis challenge
2011cited by this paper
Action unit detection using sparse appearance descriptors in space-time video volumes
2011cited by this paper
Single- and cross- database benchmarks for gender classification under unconstrained settings
2011cited by this paper
LIBSVM: A library for support vector machines
2011cited by this paper
The computer expression recognition toolbox (CERT)
2011cited by this paper
The INTERSPEECH 2011 Speaker State Challenge
2011cited by this paper
Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models
2011cited by this paper
A Psychologically-Inspired Match-Score Fusion Model for Video-Based Facial Expression Recognition
2011cited by this paper
Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space
2011influential reference
AVEC 2011-The First International Audio/Visual Emotion Challenge
2011cited by this paper
Survey on speech emotion recognition: Features, classification schemes, and databases
2011cited by this paper
Audio-Based Emotion Recognition from Natural Conversations Based on Co-Occurrence Matrix and Frequency Domain Energy Distribution Features
2011cited by this paper
The INTERSPEECH 2010 paralinguistic challenge
2010cited by this paper
Average of Synthetic Exact Filters
2009cited by this paper
Facial expression recognition based on Local Binary Patterns: A comprehensive study
2009cited by this paper
The INTERSPEECH 2009 emotion challenge
2009cited by this paper
Descriptor Based Methods in the Wild
2008cited by this paper
Toward Pose-Invariant 2-D Face Recognition Through Point Distribution Models and Facial Symmetry
2007cited by this paper
Emotion Recognition using Mel-Frequency Cepstral Coefficients
2007cited by this paper
Eigenvoice modeling with sparse training data
2005cited by this paper
Canonical Correlation Analysis: An Overview with Application to Learning Methods
2004cited by this paper
Face Recognition with Local Binary Patterns
2004cited by this paper
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns
2002cited by this paper
Speaker Adaptation Using Eigenvoices
1999cited by this paper
Eigenfaces and eigenvoices: dimensionality reduction for specialized pattern recognition
1998cited by this paper
Affective Computing
1997cited by this paper
Fundamentals of speech recognition
1993cited by this paper
An Algorithm for Vector Quantizer Design
1980cited by this paper
Theory in Communication
1958cited by this paper

CITED BY

United We Stand, Divided We Fall: Handling Weak Complementarity for Audio-Visual Emotion Recognition in Valence-Arousal Space
2025cites this paper
Dynamic Temporal Gating Networks for Cross-Modal Valence-Arousal Estimation
2025cites this paper
TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation
2025cites this paper
United we stand, Divided we fall: Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition in Valence-Arousal Space
2025cites this paper
Emotion Recognition from the perspective of Activity Recognition
2024cites this paper
Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition
2024cites this paper
Is There a Difference in Facial Emotion Recognition after Stroke with vs. without Central Facial Paresis?
2022cites this paper
Multimodal Sentiment Analysis based on Recurrent Neural Network and Multimodal Attention
2021cites this paper
Continuous Multi-modal Emotion Prediction in Video based on Recurrent Neural Network Variants with Attention
2021cites this paper
A Multitask Learning Framework for Multimodal Sentiment Analysis
2021cites this paper
Hybrid Mutimodal Fusion for Dimensional Emotion Recognition
2021cites this paper
Multi-modal Fusion for Continuous Emotion Recognition by Using Auto-Encoders
2021cites this paper
Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism
2020cites this paper
An Unsupervised Subspace Ranking Method for Continuous Emotions in Face Images
2019influential citation
Dynamic Facial Models for Video-Based Dimensional Affect Estimation
2019cites this paper
Machine Learning Systems for Multimodal Affect Recognition
2019cites this paper
Bipolar Disorder Recognition via Multi-scale Discriminative Audio Temporal Representation
2018cites this paper
Automated depression analysis using convolutional neural networks from speech
2018cites this paper
AFEW-VA database for valence and arousal estimation in-the-wild
2017influential citation
Automatic Emotion Recognition: Quantifying Dynamics and Structure in Human Behavior.
2016cites this paper
Time-Delay Neural Network for Continuous Emotional Dimension Prediction From Facial Expression Sequences
2016cites this paper
Emotional Tone-Based Audio Continuous Emotion Recognition
2015cites this paper
Robust machine learning methods for computational paralinguistics and multimodal affective computing (Hesaplamasal paralinguistik ve çok-kipli duyuşsal hesaplama için gürbüz yapay öğrenme yönemleri)
2015cites this paper
Exploring sources of variation in human behavioral data: Towards automatic audio-visual emotion recognition
2015cites this paper
Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face
2015cites this paper
A study of acoustic features for the classification of depressed speech
2014cites this paper
Ensemble CCA for Continuous Emotion Prediction
2014influential citation
iVectors for Continuous Emotion Recognition
2014cites this paper
Emotion Recognition and Depression Diagnosis by Acoustic and Visual Features: A Multimodal Approach
2014cites this paper
Representation of facial expression categories in continuous arousal-valence space: Feature and correlation
2014cites this paper
A study of acoustic features for depression detection
2014cites this paper
Fusion of Audio-visual Features using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression
2014cites this paper
Emotion Recognition in Real-world Conditions with Acoustic and Visual Features
2014cites this paper
Automatic Recognition of Personality Traits: A Multimodal Approach
2014cites this paper