Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein R. Nowdeh,Jie Ji,Xiaolong Ma,Fatemeh Afghah

Published 2025 in arXiv.org

ABSTRACT

In multimodal learning, dominant modalities often overshadow others, limiting generalization. We propose Modality-Aware Sharpness-Aware Minimization (M-SAM), a model-agnostic framework that applies to many modalities and supports early and late fusion scenarios. In every iteration, M-SAM in three steps optimizes learning. \textbf{First, it identifies the dominant modality} based on modalities'contribution in the accuracy using Shapley. \textbf{Second, it decomposes the loss landscape}, or in another language, it modulates the loss to prioritize the robustness of the model in favor of the dominant modality, and \textbf{third, M-SAM updates the weights} by backpropagation of modulated gradients. This ensures robust learning for the dominant modality while enhancing contributions from others, allowing the model to explore and exploit complementary features that strengthen overall performance. Extensive experiments on four diverse datasets show that M-SAM outperforms the latest state-of-the-art optimization and gradient manipulation methods and significantly balances and improves multimodal learning.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-10-28
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2510.24919 arXiv 2510.24919
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Friendly Sharpness-Aware Minimization
2024cited by this paper
Gradient Alignment for Cross-Domain Face Anti-Spoofing
2024cited by this paper
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
2024cited by this paper
ReconBoost: Boosting Can Achieve Modality Reconcilement
2024cited by this paper
Improving Multimodal Learning with Multi-Loss Gradient Modulation
2024cited by this paper
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
2023cited by this paper
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
2023influential reference
ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition
2023cited by this paper
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
2022cited by this paper
Splash in a Flash: Sharpness-aware minimization for efficient liquid splash simulation
2022cited by this paper
Multimodal Learning With Transformers: A Survey
2022cited by this paper
Fisher SAM: Information Geometry and Sharpness Aware Minimisation
2022cited by this paper
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
2022cited by this paper
Modality-specific Learning Rates for Effective Multimodal Additive Late-fusion
2022cited by this paper
Balanced Multimodal Learning via On-the-fly Gradient Modulation
2022influential reference
Surrogate Gap Minimization Improves Sharpness-Aware Training
2022cited by this paper
The Shapley Value in Machine Learning
2022cited by this paper
Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)
2022cited by this paper
PMR: Prototypical Modal Rebalance for Multimodal Learning
2022cited by this paper
Learning to Balance the Learning Rates Between Various Modalities via Adaptive Tracking Factor
2021cited by this paper
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
2021cited by this paper
When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations
2021cited by this paper
What Makes Multimodal Learning Better than Single (Provably)
2021cited by this paper
Improving Multi-Modal Learning with Uni-Modal Teachers
2021cited by this paper
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
2021cited by this paper
Sharpness-Aware Minimization Improves Language Model Generalization
2021cited by this paper
Deep Audio-visual Learning: A Survey
2020cited by this paper
Improving Multimodal Accuracy Through Modality Pre-training and Attention
2020cited by this paper
Sharpness-Aware Minimization for Efficiently Improving Generalization
2020cited by this paper
On Modality Bias in the TVQA Dataset
2020cited by this paper
A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis
2020cited by this paper
What Makes Training Multi-Modal Classification Networks Hard?
2019cited by this paper
Modality-Specific Learning Rate Control for Multimodal Classification
2019cited by this paper
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
2019cited by this paper
Audio-Visual Event Localization in Unconstrained Videos
2018cited by this paper
CentralNet: a Multilayer Approach for Multimodal Fusion
2018cited by this paper
Attention is All you Need
2017cited by this paper
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
2016cited by this paper
CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset
2014cited by this paper
Maxout Networks
2013cited by this paper
Multimodal Deep Learning
2011cited by this paper
Effects of Registration Regularization and Atlas Sharpness on Segmentation Accuracy
2007cited by this paper
Long Short-Term Memory
1997cited by this paper

CITED BY

No citing papers are available for this paper.