Semantic structure preservation for accurate multi-modal glioma diagnosis

Chaoyu Shi,Xia Zhang,Runzhen Zhao,Wen Zhang,Fei Chen

Published 2025 in Scientific Reports

ABSTRACT

Pretraining has laid the foundation for the recent success of deep learning in multimodal medical image analysis. However, existing methods often overlook the semantic structure embedded in modality-specific representations, and supervised pretraining requires a carefully designed, time-consuming two-stage annotation process. To address this, we propose a novel semantic structure-preserving consistency method, named “Review of Free-Text Reports for Preserving Multimodal Semantic Structure” (RFPMSS). During the semantic structure training phase, we learn multiple anchors to capture the semantic structure of each modality, and sample-sample relationships are represented by associating samples with these anchors, forming modality-specific semantic relationships. For comprehensive modality alignment, RFPMSS extracts supervision signals from patient examination reports, establishing global alignment between images and text. Evaluations on datasets collected from Shanxi Provincial Cancer Hospital and Shanxi Provincial People’s Hospital demonstrate that our proposed cross-modal supervision using free-text image reports and multi-anchor allocation achieves state-of-the-art performance under highly limited supervision. Code: https://github.com/shichaoyu1/RFPMSS

PUBLICATION RECORD

Publication year
2025
Venue
Scientific Reports
Publication date
2025-02-28
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1038/s41598-025-88458-7 PMID 40021688 PMCID 11871068
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Multimodal brain tumor image segmentation based on DenseNet
2024cited by this paper
Artificial Intelligence and Deep Learning in Revolutionizing Brain Tumor Diagnosis and Treatment: A Narrative Review
2024cited by this paper
A Review of Multi-Modal Large Language and Vision Models
2024cited by this paper
Multimodal data integration for oncology in the era of deep neural networks: a review
2023cited by this paper
MMGan: a multimodal MR brain tumor image segmentation method
2023cited by this paper
A Survey on Multimodal Large Language Models for Autonomous Driving
2023cited by this paper
A survey on multimodal large language models
2023cited by this paper
Towards Realistic Semi-Supervised Learning
2022cited by this paper
The University of Pennsylvania glioblastoma (UPenn-GBM) cohort: advanced MRI, clinical, genomics, & radiomics
2022cited by this paper
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)
2022cited by this paper
Transferable Visual Words: Exploiting the Semantics of Anatomical Patterns for Self-Supervised Learning
2021cited by this paper
Everything at Once – Multi-modal Fusion Transformer for Video Retrieval
2021cited by this paper
A Unified Objective for Novel Class Discovery
2021cited by this paper
The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification
2021cited by this paper
VinVL: Revisiting Visual Representations in Vision-Language Models
2021cited by this paper
ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis
2021cited by this paper
Open World Compositional Zero-Shot Learning
2021cited by this paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
2020cited by this paper
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
2020cited by this paper
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
2020cited by this paper
Semantic Correspondence as an Optimal Transport Problem
2020cited by this paper
Self-Supervised MultiModal Versatile Networks
2020cited by this paper
Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey
2020cited by this paper
Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations
2020cited by this paper
Mish: A Self Regularized Non-Monotonic Activation Function
2020cited by this paper
Support-set bottlenecks for video-text representation learning
2020cited by this paper
Models Genesis
2020cited by this paper
SuperGlue: Learning Feature Matching With Graph Neural Networks
2019cited by this paper
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
2019cited by this paper
Self-supervised learning for medical image analysis using image context restoration
2019cited by this paper
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
2019cited by this paper
Brain Tumor Classification Using ResNet-101 Based Squeeze and Excitation Deep Neural Network
2019cited by this paper
Optimal Transport vs Many-to-many assignment for Graph Matching
2019cited by this paper
MR and CT data with multiobserver delineations of organs in the pelvic area—Part of the Gold Atlas project
2018cited by this paper
Representation Learning with Contrastive Predictive Coding
2018cited by this paper
Attention is All you Need
2017influential reference
One Model To Learn Them All
2017cited by this paper
Learnable pooling with Context Gating for video classification
2017cited by this paper
Look, Listen and Learn
2017cited by this paper
Generative Adversarial Text to Image Synthesis
2016cited by this paper
Layer Normalization
2016cited by this paper
SoundNet: Learning Sound Representations from Unlabeled Video
2016cited by this paper
The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)
2015influential reference
Improving deep neural networks for LVCSR using rectified linear units and dropout
2013cited by this paper
Noise-contrastive estimation: A new estimation principle for unnormalized statistical models
2010cited by this paper
Mitigation strategies to reduce pesticide inputs into ground- and surface water and their effectiveness; a review.
2007cited by this paper
Concerning nonnegative matrices and doubly stochastic matrices
1967cited by this paper
Medical Image Analysis
year unknowncited by this paper

CITED BY

From task-specific to foundation models: A paradigm shift in medical vision-language analysis
2026cites this paper
Neural stem cell-delivered oncolytic virus via intracerebroventricular administration enhances glioblastoma therapy and immune modulation
2025cites this paper