The inherent differences in the imaging mechanisms of synthetic aperture radar (SAR) and visible sensors result in substantial disparities in their modality characterization, posing significant challenges for achieving high-quality image fusion. Current multisource image fusion techniques fall short in addressing the critical issue of modality differences. The emerging Mamba model has demonstrated remarkable potential across various image-related tasks. However, it lacks the crucial design element of cross-attention-like mechanisms in the fusion process. To bridge these gaps, we introduce an innovative frequency-domain Mask-Decoupled Multiscale Mamba Fusion (MMMF) framework for SAR and visible images. The MMMF framework comprises two key components. The first component, mask decoupling for coarse fusion, utilizes two complementary circular masks to disentangle high- and low-frequency features. Then, a specially designed frequency-domain cross-phase module is employed to interlace the phases of SAR and visible modalities, enabling a lightweight coarse fusion process. The second component revolves around multiscale mamba fusion. Here, the Modal-Interactive Mamba Module integrates a consistency gating mechanism, harmonizing the heterogeneous State Space Model (SSM) with four-directional Cross-SSM (CSM). This effectively reduces modality heterogeneity and fortifies the feature interaction between SAR and visible images. In addition, the Frequency-Coupled Mamba Module, coupled with a cross-gated attention mechanism, conducts comprehensive modeling at both sequence- and matrix-level, achieving seamless integration of high- and low-frequency components. Furthermore, we have established a novel constraint methodology that operates jointly in the spatial and frequency domains. By imposing constraints on the frequency-domain coarse fusion, this approach not only preserves spatial authenticity, but also heightens spectral fidelity, thereby enabling a more comprehensive fusion of source image features. Extensive image fusion quality evaluations were performed on three datasets with varying resolutions and image sizes, namely YYX-OPT-SAR, WHU-OPT-SAR, and the multimodal LCC dataset, benchmarking the proposed MMMF against 13 state-of-the-art techniques. The results demonstrate the effectiveness of MMMF. Furthermore, on the multimodal LCC dataset, MMMF was compared with three advanced fusion methods in both land cover classification and robustness tests, highlighting its potential for downstream applications and the well-designed robustness of the model. Comprehensive ablation studies further confirm the indispensability of each module within the proposed framework.
MMMF: Mask-Decoupled Multiscale Mamba Fusion Framework for SAR and Visible Images
Yunzhong Yan,Jun Li,La Jiang,Shuowei Liu,Zhen Liu
Published 2026 in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
- Publication date
Unknown publication date
- Fields of study
Computer Science, Engineering, Environmental Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-49 of 49 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1