Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion.

Yixin Zhu,Long Lv,Pingping Zhang,Xuehu Liu,Tongdan Tang,Feng Tian,Weibing Sun,Huchuan Lu

Published 2026 in IEEE Transactions on Image Processing

ABSTRACT

Multi-Modal Image Fusion (MMIF) aims to combine images from different modalities to produce fused images, retaining texture details and preserving significant information. Recently, some MMIF methods incorporate frequency domain information to enhance spatial features. However, these methods typically rely on simple serial or parallel spatial-frequency fusion without interaction. In this paper, we propose a novel Interactive Spatial-Frequency Fusion Mamba (ISFM) framework for MMIF. Specifically, we begin with a Modality-Specific Extractor (MSE) to extract features from different modalities. It models long-range dependencies across the image with linear computational complexity. To effectively leverage frequency information, we then propose a Multi-scale Frequency Fusion (MFF). It adaptively integrates low-frequency and high-frequency components across multiple scales, enabling robust representations of frequency features. More importantly, we further propose an Interactive Spatial-Frequency Fusion (ISF). It incorporates frequency features to guide spatial features across modalities, enhancing complementary representations. Extensive experiments are conducted on six MMIF datasets. The experimental results demonstrate that our ISFM can achieve better performances than other state-of-the-art methods. The source code is available at https://github.com/Namn23/ISFM.

PUBLICATION RECORD

Publication year
2026
Venue
IEEE Transactions on Image Processing
Publication date
2026-02-04
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1109/TIP.2026.3662596 arXiv 2602.04405 PMID 41697815
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Residual Prior-driven Frequency-aware Network for Image Fusion
2025influential reference
OmniFuse: Composite Degradation-Robust Image Fusion With Language-Driven Semantics
2025cited by this paper
Artificial intelligence facilitates information fusion for perception in complex environments
2025cited by this paper
An Efficient Image Fusion Network Exploiting Unifying Language and Mask Guidance
2025cited by this paper
Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion
2024cited by this paper
MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion
2024influential reference
FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba
2024influential reference
Frequency Integration and Spatial Compensation Network for infrared and visible image fusion
2024influential reference
SFCFusion: Spatial–Frequency Collaborative Infrared and Visible Image Fusion
2024influential reference
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
2024cited by this paper
Pan-Mamba: Effective pan-sharpening with State Space Model
2024cited by this paper
An efficient frequency domain fusion network of infrared and visible images
2024cited by this paper
VMamba: Visual State Space Model
2024cited by this paper
Cross-Modal Transformers for Infrared and Visible Image Fusion
2024cited by this paper
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
2024cited by this paper
SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion
2024cited by this paper
An Infrared and Visible Image Fusion Network Based on Frequency Domain Information Extraction
2024cited by this paper
Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion
2024cited by this paper
DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer
2023influential reference
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion
2023influential reference
Segment Anything
2023cited by this paper
LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images
2023influential reference
DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion
2023cited by this paper
An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection
2023influential reference
Equivariant Multi-Modality Image Fusion
2023cited by this paper
Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation
2023cited by this paper
AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention
2023influential reference
TransY-Net: Learning Fully Transformer Networks for Change Detection of Remote Sensing Images
2023cited by this paper
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
2023cited by this paper
PIAFusion: A progressive infrared and visible image fusion network based on illumination aware
2022cited by this paper
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
2022cited by this paper
Fusion from Decomposition: A Self-Supervised Decomposition Approach for Image Fusion
2022influential reference
Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration
2022influential reference
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection
2022influential reference
ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-modality Image Fusion
2022influential reference
DIVFusion: Darkness-free infrared and visible image fusion
2022cited by this paper
MATR: Multimodal Medical Image Fusion via Multiscale Adaptive Transformer
2022cited by this paper
SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer
2022cited by this paper
STDFusionNet: An Infrared and Visible Image Fusion Network Based on Salient Target Detection
2021cited by this paper
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021cited by this paper
Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity
2020cited by this paper
Infrared and visible image fusion via detail preserving adversarial learning
2020cited by this paper
U2Fusion: A Unified Unsupervised Image Fusion Network
2020influential reference
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
2020cited by this paper
DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion
2020cited by this paper
GLU Variants Improve Transformer
2020cited by this paper
On Layer Normalization in the Transformer Architecture
2020cited by this paper
FusionGAN: A generative adversarial network for infrared and visible image fusion
2019cited by this paper
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
2018cited by this paper
Infrared and visible image fusion methods and applications: A survey
2018cited by this paper
GENERATIVE ADVERSARIAL NETS
2018cited by this paper
Feature Selection based on PCA and PSO for Multimodal Medical Image Fusion using DTCWT
2017cited by this paper
You Only Look Once: Unified, Real-Time Object Detection
2015cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Region-based multi-focus image fusion using the local spatial frequency
2013cited by this paper
A new Contrast Based Image Fusion using Wavelet Packets
2008cited by this paper
Image quality assessment: from error visibility to structural similarity
2004cited by this paper
A Theory for Multiresolution Signal Decomposition: The Wavelet Representation
1989cited by this paper

CITED BY

No citing papers are available for this paper.