The One Where They Brain-Tune for Social Cognition: Multi-Modal Brain-Tuning on Friends

Nico Policzer,Cameron Braunstein,M. Toneva

Published 2025 in arXiv.org

ABSTRACT

Recent studies on audio models show brain-tuning - fine-tuning models to better predict corresponding fMRI activity - improves brain alignment and increases performance on downstream semantic and audio tasks. We extend this approach to a multimodal audio-video model to enhance social cognition, targeting the Superior Temporal Sulcus (STS), a key region for social processing, while subjects watch Friends. We find significant increases in brain alignment to the STS and an adjacent ROI, as well as improvements to a social cognition task related to the training data - sarcasm detection in sitcoms. In summary, our study extends brain-tuning to the multi-modal domain, demonstrating improvements to a downstream task after tuning to a relevant functional region.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-11-11
Fields of study
Computer Science, Psychology
Identifiers
DOI 10.48550/arXiv.2511.07988 arXiv 2511.07988
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Alignment of auditory artificial networks with massive individual fMRI brain data leads to generalisable improvements in brain encoding and downstream tasks
2025influential reference
TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction
2025cited by this paper
Brain-Informed Fine-Tuning for Improved Multilingual Understanding in Language Models
2025cited by this paper
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
2025cited by this paper
Multi-modal brain encoding models for multi-modal stimuli
2025influential reference
Modeling dynamic social vision highlights gaps between deep learning and humans
2025cited by this paper
A Survey of Multimodal Learning: Methods, Applications, and Future
2025cited by this paper
Improving semantic understanding in speech language models via brain-tuning
2024influential reference
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
2024cited by this paper
Unveiling Multi-level and Multi-modal Semantic Representations in the Human Brain using Large Language Models
2024cited by this paper
Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience
2023cited by this paper
ImageBind One Embedding Space to Bind Them All
2023cited by this paper
Scaling laws for language encoding models in fMRI
2023cited by this paper
Emotion detection and its influence on popularity in a social network-based on the American TV series friends
2023cited by this paper
Vision-Language Integration in Multimodal Video Transformers (Partially) Aligns with the Brain
2023influential reference
TVLT: Textless Vision-Language Transformer
2022influential reference
Social Neuro AI: Social Interaction as the “Dark Matter” of AI
2021cited by this paper
Functional selectivity for social interaction perception in the human superior temporal sulcus during natural viewing
2021cited by this paper
Evidence for a Third Visual Pathway Specialized for Social Perception.
2020influential reference
The neural architecture of language: Integrative modeling converges on predictive processing
2020cited by this paper
Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)
2019cited by this paper
Neural responses to visually observed social interactions
2018cited by this paper
Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph
2018cited by this paper
Perceiving social interactions in the posterior superior temporal sulcus
2017cited by this paper
PT735. Multisensory integration of social interaction
2016cited by this paper
Functional Organization of Social Perception and Cognition in the Superior Temporal Sulcus
2015cited by this paper
Integrating Face and Voice in Person Perception
2013cited by this paper
Superior Temporal SulcusIt's My Area: Or Is It?
2008cited by this paper
Social perception from visual cues: role of the STS region.
2000cited by this paper

CITED BY

Cognitive Dark Matter: Measuring What AI Misses
2026cites this paper