Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers

Michelle Cohn,Kristin Predeck,Melina Sarian,Georgia Zellou

Published 2021 in Speech Communication

ABSTRACT

This study tests whether individuals vocally align toward emotionally expressive prosody produced by two types of interlocutors: a human and a voice-activated artificially intelligent (voice-AI) assistant. Participants completed a word shadowing experiment of interjections (e.g., “ Awesome ” ) produced in emotionally neutral and expressive prosodies by both a human voice and a voice generated by a voice-AI system (Amazon ’ s Alexa). Results show increases in participants ’ word duration, mean f0, and f0 variation in response to emotional expressiveness, consistent with increased alignment toward a general ‘positive-emotional ’ speech style. Small differences in emotional alignment by talker category (human vs. voice-AI) parallel the acoustic differences in the model talkers ’ productions, suggesting that participants are mirroring the acoustics they hear. The similar responses to emotion in both a human and voice-AI talker support accounts of unmediated emotional alignment, as well as computer personification: people apply emotionally-mediated behaviors to both types of interlocutors. While there were small differences in magnitude by participant gender, the overall patterns were similar for women and men, supporting a nuanced picture of emotional vocal alignment.

PUBLICATION RECORD

Publication year
2021
Venue
Speech Communication
Publication date
2021-10-01
Fields of study
Linguistics, Psychology, Computer Science
Identifiers
DOI 10.1016/j.specom.2021.10.003
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Problems in the Difference-in-Distance measure of phonetic imitation
2021influential reference
Speech Rate Adjustments in Conversations With an Amazon Alexa Socialbot
2021cited by this paper
Book Review: The Social Nature of Emotion Expression: What Emotions Can Tell Us About the World
2021cited by this paper
Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors
2020cited by this paper
Does top-down information about speaker age guise influence perceptual compensation for coarticulatory /u/-fronting?
2020cited by this paper
Embodiment and gender interact in alignment to TTS voices
2020cited by this paper
The Linguistic and the Social Intertwined: Linguistic Convergence Toward Southern Speech
2020cited by this paper
Differences in Gradient Emotion Perception: Human vs. Alexa Voices
2020cited by this paper
“Speech Melody and Speech Content Didn’t Fit Together”—Differences in Speech Behavior for Device Directed and Human Directed Interactions
2020cited by this paper
The Uncanny Valley
2019cited by this paper
Acoustic voice variation within and between speakers.
2019cited by this paper
Limitations of difference-in-difference for measuring convergence
2019cited by this paper
Communication Accommodation Theory
2019cited by this paper
Complicating categories: Personae mediate racialized expectations of non‐native speech
2019cited by this paper
Music, Search, and IoT
2019cited by this paper
Expressiveness Influences Human Vocal Alignment Toward voice-AI
2019influential reference
The restaurant booking corpus – content-identical comparative human-human and humancomputer simulated telephone conversations
2019cited by this paper
Individual Variation in Cognitive Processing Style Predicts Differences in Phonetic Imitation of Device and Human Voices
2019influential reference
Emotion Recognition as a Social Act: The Role of the Expresser-Observer Relationship in Recognizing Emotions
2019influential reference
Three's a Crowd? Effects of a Second Human on Vocal Accommodation with a Voice Assistant
2019influential reference
A Large-Scale User Study of an Alexa Prize Chatbot: Effect of TTS Dynamism on Perceived Quality of Social Dialog
2019influential reference
IMITATING SIRI: SOCIALLY-MEDIATED VOCAL ALIGNMENT TO DEVICE AND HUMAN VOICES
2019influential reference
Should Machines Express Sympathy and Empathy? Experiments with a Health Advice Chatbot
2018cited by this paper
Investigating Prosodic Accommodation in Clinical Interviews with Depressed Patients
2018influential reference
Is it Happy?: Behavioural and Narrative Frame Complexity Impact Perceptions of a Simple Furry Robot's Emotions
2018cited by this paper
Auditory smiles trigger unconscious facial imitation.
2018cited by this paper
Understanding the Long-Term Use of Smart Speaker Assistants
2018cited by this paper
Effect of Formant and F0 Discontinuity on Perceived Vowel Duration: Impacts for Concatenative Speech Synthesis
2017cited by this paper
Self-Reported Symptoms of Depression and PTSD Are Associated with Reduced Vowel Space in Screening Interviews
2016cited by this paper
Expectations and speech intelligibility.
2015cited by this paper
Analyzing speech rate entrainment and its relation to therapist empathy in drug addiction counseling
2015cited by this paper
Repeat what after whom? Exploring variable selectivity in a cross-dialectal shadowing task
2015cited by this paper
Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human-computer dialogue
2015cited by this paper
Emotional synchrony and covariation of behavioral/physiological reactions between interlocutors
2014influential reference
Emotional mimicry: why and when we mimic emotions
2014cited by this paper
Fitting Linear Mixed-Effects Models Using lme4
2014cited by this paper
Dyadic Behavior Analysis in Depression Severity Assessment Interviews
2014cited by this paper
Detecting Depression Severity from Vocal Prosody
2013cited by this paper
Modeling therapist empathy and vocal entrainment in drug addiction counseling
2013cited by this paper
Phonetic Imitation from an Individual-Difference Perspective: Subjective Attitude, Personality and “Autistic” Traits
2013cited by this paper
Emotional Mimicry as Social Regulation
2013influential reference
Phonetic Imitation of Vowel Duration in L2 Speech
2013cited by this paper
Evidence for phonetic and social selectivity in spontaneous phonetic imitation
2012cited by this paper
The Uncanny Valley [From the Field]
2012cited by this paper
The Role of Fundamental Frequency in Phonetic Accommodation
2012cited by this paper
Phonetic convergence in college roommates
2012cited by this paper
Convergent and divergent responses to emotional displays of ingroup and outgroup.
2011cited by this paper
The role of beliefs in lexical alignment: evidence from dialogs with humans and computers.
2011cited by this paper
Specificity and abstractness of VOT imitation
2011cited by this paper
Effects of Speaker Evaluation on Phonetic Convergence
2011cited by this paper
Virtually accommodating: Speech rate accommodation to a virtual interlocutor
2010cited by this paper
Conversational role influences speech imitation
2010cited by this paper
Computer- and human-directed speech before and after correction
2010cited by this paper
Dialect divergence and convergence in New Zealand English
2010cited by this paper
Communicating emotion: linking affective prosody and word meaning.
2008cited by this paper
Gender differences in facial imitation and verbally reported emotional contagion from spontaneous to emotionally regulated processing levels.
2008cited by this paper
Affective divergence: automatic responses to others' emotions depend on group membership.
2008cited by this paper
The ‘Russian Doll’ model of empathy and imitation
2007influential reference
The anthropomorphic brain: The mirror neuron system responds to human and robotic actions
2007cited by this paper
PROSODIC ACCOMMODATION BY FRENCH SPEAKERS TO A NON-NATIVE INTERLOCUTOR
2007cited by this paper
A perception-action model for empathy
2007influential reference
Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology.
2007cited by this paper
A Social-Neuroscience Perspective on Empathy
2006influential reference
Factors influencing speech perception in the context of a merger-in-progress
2006cited by this paper
On phonetic convergence during conversational interaction.
2006cited by this paper
The effect of group-identification on emotion recognition: The case of cats and basketball players
2006cited by this paper
Improving automotive safety by pairing driver emotion and car voice emotion
2005cited by this paper
Computers that care: investigating the effects of orientation of emotion exhibited by an embodied computer agent
2005cited by this paper
Embodiment and Gender
2004cited by this paper
An acoustic study of emotions expressed in speech
2004influential reference
“You Stupid Tin Box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus
2004cited by this paper
The Chameleon Effect as Social Glue: Evidence for the Evolutionary Significance of Nonconscious Mimicry
2003cited by this paper
Acoustical Analysis of Posed Prosodic Expressions: Effects of Emotion and Sex
2003influential reference
Methodological requirements to test a possible in-group advantage in judging emotions across cultures: comment on Elfenbein and Ambady (2002) and evidence.
2002cited by this paper
Gender Differences in Vocal Accommodation:
2002cited by this paper
Cross linguistic interpretation of emotional prosody
2002cited by this paper
The perception–behavior expressway: Automatic effects of social perception on social behavior.
2001cited by this paper
Are People Polite to Computers? Responses to Computer-Based Interviewing Systems1
1999cited by this paper
Predicting hyperarticulate speech during human-computer error resolution
1998cited by this paper
Echoes of echoes? An episodic theory of lexical access.
1998cited by this paper
Computers are social actors: a review of current research
1997cited by this paper
Words and voices: episodic traces in spoken word identification and recognition memory.
1996cited by this paper
Can computer personalities be human personalities?
1995cited by this paper
Emotional Contagion
1995cited by this paper
Computers are social actors
1994cited by this paper
Applying analysis of human emotional speech to enhance synthetic speech
1993influential reference
Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion.
1993influential reference
Interjections: The universal yet neglected part of speech
1992cited by this paper
Effects of Group Laughter on Responses to Humourous Material, a Replication and Extension
1974cited by this paper
Effects of Group Laughter on Responses to Humorous Material
1972cited by this paper

CITED BY

Robot speech: how variability matters for child–robot interactions
2026cites this paper
Prosodic cues strengthen human-AI voice boundaries: Listeners do not easily perceive human speakers and AI clones as the same person
2026cites this paper
Human AI communication (HA-C): Transforming the role of technology in human interaction
2025cites this paper
Toward Machine Interpreting: Lessons from Human Interpreting Studies
2025cites this paper
Exploring the Relationship Between Mental Boundary Strength and Phonetic Accommodation.
2025cites this paper
Affect-Enhancing Speech Characteristics for Robotic Communication
2025cites this paper
Acoustic and Affective Dynamics in Children's Storytelling with AI and Human Partners
2025cites this paper
To 'errr' is robot: How humans interpret hesitations in the speech of a humanoid robot
2025cites this paper
Children and adults produce distinct technology- and human-directed speech
2024cites this paper
African American English speakers' pitch variation and rate adjustments for imagined technological and human addressees.
2024cites this paper
Do People Mirror Emotion Differently with a Human or TTS Voice? Comparing Listener Ratings and Word Embeddings
2024influential citation
A non-randomized feasibility study of a voice assistant for parents to support their children’s mental health
2024cites this paper
A prosodic analysis of emotional expressions in Langkat Malay speech
2024cites this paper
Marking Prosodic Prominence for Voice Assistant and Human Addressees
2024cites this paper
Comparing alignment toward American, British, and Indian English text-to-speech (TTS) voices: influence of social attitudes and talker guise
2023cites this paper
Listener beliefs and perceptual learning: Differences between device and human guises
2023cites this paper
AI and AI-powered tools for pronunciation training
2023cites this paper
Cross-linguistic Emotion Perception in Human and TTS Voices
2023cites this paper
Vocal accommodation to technology: the role of physical form
2023cites this paper
Social Robotics meets Sociolinguistics: Investigating Accent Bias and Social Context in HRI
2023cites this paper
Listener beliefs and perceptual learning: Differences between device and human guises
2023cites this paper
Acquiring religious words: dialogical and individual construction of a word's meaning
2022cites this paper
Effects of Emotional Expressiveness on Voice Chatbot Interactions
2022influential citation
DO HUMANS CONVERGE PHONETICALLY WHEN TALKING TO A ROBOT?
year unknowncites this paper
Impact of synthetic voice and avatar animation on the usability of a dialogue agent for digital health monitoring
year unknowncites this paper