Predicting human emotions is catching the attention of many research areas, which demand accurate predictions in uncontrolled scenarios. Despite this attractiveness, designed systems for emotion detection are far off being as accurate as desired. Two of the typical measurements in human emotions are described in terms of the dimensions valence and arousal, which shape the Russell's circumplex where complex emotions lie. Thus, the Affect Recognition Sub-Challenge (ASC) of the third AudioVisual Emotion and Depression Challenge, AVEC'13, is focused on estimating these two dimensions. This paper presents a three-level fusion system combining single regression results from audio and visual features, in order to maximize the mean average correlation on both dimensions. Five sets of features are extracted (three for audio and two for video), and they are merged following an iterative process. Results show how this fusion outperforms the baseline method for the challenge database.
Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex
Enrique Sánchez-Lozano,Paula Lopez-Otero,Laura Docío Fernández,Enrique Argones-Rúa,J. Alba-Castro
Published 2013 in AVEC@ACM Multimedia
ABSTRACT
PUBLICATION RECORD
- Publication year
2013
- Venue
AVEC@ACM Multimedia
- Publication date
2013-10-21
- Fields of study
Computer Science, Psychology
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-38 of 38 references · Page 1 of 1
CITED BY
Showing 1-34 of 34 citing papers · Page 1 of 1