Foundation Models Meet Medical Image Interpretation.

Licheng Jiao,Jiayao Hao,Ruiyang Li,Lingling Li,Xu Liu,Fang Liu,Wenping Ma,Puhua Chen,Zhongjian Huang,Jingyi Yang,Jiaxuan Zhao,Qigong Sun

Published 2026 in Research

ABSTRACT

Facing challenges such as limited annotated data and insufficient model generalization in medical deep learning, foundation models (FMs) are reshaping the paradigm of medical image interpretation through large-scale pretraining and efficient fine-tuning. Unlike traditional models focused on single modality and task, FMs enable multi-modal representation and task-agnostic transfer, adapting to various downstream applications without extensive annotation or retraining. This paper systematically reviews the research progress on medical FMs, focusing on medical tasks, datasets, and evaluation metrics. It covers key interpretation tasks such as classification, segmentation, generation, and prognosis prediction. At the data level, it integrates multi-source data including 2-dimensional (2D)/3D medical imaging, vision-language data, electronic health records (EHRs), physiological signals, and bioinformatics data, and summarizes the evaluation metrics for each task. On this basis, the paper categorizes and analyzes mainstream medical FMs, including pretrained models, vision FMs, vision-language FMs, and extended multi-modal FMs, providing a systematic comparison of their performance and characteristics. Furthermore, we innovatively proposes the IPIU medical FM platform, which integrates large-scale medical data, universal vision models, medical vision-language models, and medical large language models, and verifies its effectiveness in typical clinical tasks. In addition, this work is the first to systematically analyze the key challenges and emerging trends of medical FMs across 12 critical dimensions, including data, modeling, security, and computational resources, filling the gaps in the existing reviews in systematic sorting and forward-looking analysis. Our aim is to provide theoretical support and practical reference for the sustainable development of medical FMs. Related resources and literature lists will be open sourced on https://github.com/JYAOii/Foundation-Models-meet-Medical-Image-Interpretation.

PUBLICATION RECORD

Publication year
2026
Venue
Research
Publication date
2026-01-01
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.34133/research.1024 PMID 41767596 PMCID PMC12946388
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

VOILA: Complexity-Aware Universal Segmentation of CT images by Voxel Interacting with Language
2025cited by this paper
ECG-FM: an open electrocardiogram foundation model
2025cited by this paper
Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data
2025cited by this paper
Improving Clinical Foundation Models with Multi-modal Learning and Domain Adaptation for Chronic Disease Prediction.
2025cited by this paper
SAM-Med3D: A Vision Foundation Model for General-Purpose Segmentation on Volumetric Medical Images
2025cited by this paper
A foundation model to predict and capture human cognition
2025cited by this paper
Vision-language foundation models for medical imaging: a review of current practices and innovations
2025cited by this paper
A multimodal vision foundation model for clinical dermatology
2025cited by this paper
Developments in Deep Learning Artificial Neural Network Techniques for Medical Image Analysis and Interpretation
2025cited by this paper
MetaGP: A generative foundation model integrating electronic health records and multimodal imaging for addressing unmet clinical needs
2025cited by this paper
MedMAE: A Self-Supervised Backbone for Medical Imaging Tasks
2025cited by this paper
Multi-marker Similarity Enables Reduced-Reference and Interpretable Image Quality Assessment in Optical Microscopy
2025cited by this paper
MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation
2025cited by this paper
Comprehensive review of reinforcement learning for medical ultrasound imaging
2025cited by this paper
General lightweight framework for vision foundation model supporting multi-task and multi-center medical image analysis
2025cited by this paper
Skin cancer detection using dermoscopic images with convolutional neural network
2025cited by this paper
A Foundation Model for Lesion Segmentation on Brain MRI With Mixture of Modality Experts
2025cited by this paper
Foundation models in bioinformatics
2025cited by this paper
Text-driven adaptation of foundation models for few-shot surgical workflow analysis
2025cited by this paper
Foundation Models Defining a New Era in Vision: A Survey and Outlook
2025cited by this paper
A vision–language foundation model for precision oncology
2025cited by this paper
A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities
2024cited by this paper
General-purpose foundation models for increased autonomy in robot-assisted surgery
2024cited by this paper
Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery
2024cited by this paper
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
2024cited by this paper
ICycle-GAN: Improved cycle generative adversarial networks for liver medical image generation
2024cited by this paper
scGPT: toward building a foundation model for single-cell multi-omics using generative AI
2024cited by this paper
Medical long-tailed learning for imbalanced data: Bibliometric analysis
2024cited by this paper
Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning
2024cited by this paper
Vision-language models for medical report generation and visual question answering: a review
2024cited by this paper
Evaluation metrics and statistical tests for machine learning
2024cited by this paper
Foundation model for cancer imaging biomarkers
2024cited by this paper
Towards a general-purpose foundation model for computational pathology
2024cited by this paper
A visual-language foundation model for computational pathology
2024cited by this paper
Hierarchical medical image report adversarial generation with hybrid discriminator
2024cited by this paper
Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions
2024cited by this paper
A survey of the impact of self-supervised pretraining for diagnostic tasks in medical X-ray, CT, MRI, and ultrasound
2024cited by this paper
Self-supervised learning for medical image analysis: a comprehensive review
2024cited by this paper
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
2024cited by this paper
Comparative Analysis of Image Quality Assessment Metrics: MSE, PSNR, SSIM and FSIM
2024cited by this paper
A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images
2024cited by this paper
Deep Fuzzy Multiteacher Distillation Network for Medical Visual Question Answering
2024cited by this paper
A Comprehensive Survey of Foundation Models in Medicine
2024cited by this paper
The limits of fair medical imaging AI in real-world generalization
2024cited by this paper
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models
2024cited by this paper
Uni4Eye++: A General Masked Image Modeling Multi-Modal Pre-Training Framework for Ophthalmic Image Classification and Segmentation
2024cited by this paper
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
2024cited by this paper
Nature-Inspired Intelligent Computing: A Comprehensive Survey
2024cited by this paper
Visual–language foundation models in medicine
2024cited by this paper
Weighted Brier Score - an Overall Summary Measure for Risk Prediction Models with Clinical Utility Consideration.
2024cited by this paper
Boosting Your Context by Dual Similarity Checkup for In-Context Learning Medical Image Segmentation
2024cited by this paper
Rethinking masked image modelling for medical image representation
2024cited by this paper
A vision–language foundation model for the generation of realistic chest X-ray images
2024cited by this paper
Causal Inference Meets Deep Learning: A Comprehensive Survey
2024cited by this paper
A pathology foundation model for cancer diagnosis and prognosis prediction
2024cited by this paper
Language-Aware Vision Transformer for Referring Segmentation
2024cited by this paper
Towards Medical Vision-Language Contrastive Pre-training via Study-Oriented Semantic Exploration
2024cited by this paper
Multi-Task Multi-Agent Reinforcement Learning With Interaction and Task Representations
2024cited by this paper
Collaboration between clinicians and vision–language models in radiology report generation
2024cited by this paper
Nucleotide Transformer: building and evaluating robust foundation models for human genomics
2024cited by this paper
Swin-UMamba†: Adapting Mamba-Based Vision Foundation Models for Medical Image Segmentation
2024cited by this paper
Evaluating Image Synthesis: A Modest Review of Techniques and Metrics
2024cited by this paper
A foundation model for enhancing magnetic resonance images and downstream segmentation, registration and diagnostic tasks
2024cited by this paper
Self-improving generative foundation model for synthetic medical image generation and clinical applications
2024cited by this paper
Uncertainty-aware Fine-tuning of Segmentation Foundation Models
2024cited by this paper
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
2023cited by this paper
SARAMIS: Simulation Assets for Robotic Assisted and Minimally Invasive Surgery
2023cited by this paper
EPT-Net: Edge Perception Transformer for 3D Medical Image Segmentation
2023cited by this paper
ChatCAD+: Toward a Universal and Reliable Interactive CAD Using LLMs
2023cited by this paper
A generalist vision–language foundation model for diverse biomedical tasks
2023cited by this paper
Synthetic CT Generation from MRI using 3D Transformer-based Denoising Diffusion Model
2023cited by this paper
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
2023cited by this paper
Large Scale Foundation Model on Single-cell Transcriptomics
2023cited by this paper
A foundational vision transformer improves diagnostic performance for electrocardiograms
2023cited by this paper
On the Challenges and Perspectives of Foundation Models for Medical Image Analysis
2023cited by this paper
XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models
2023cited by this paper
The shaky foundations of large language models and foundation models for electronic health records
2023cited by this paper
A Survey on Deep Learning in Medical Image Registration: New Technologies, Uncertainty, Evaluation Metrics, and Beyond
2023cited by this paper
Segment anything in medical images
2023cited by this paper
Active learning-based hyperspectral image classification: a reinforcement learning approach
2023cited by this paper
A visual–language foundation model for pathology image analysis using medical Twitter
2023cited by this paper
Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence
2023cited by this paper
Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning
2023cited by this paper
Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient
2023cited by this paper
MITER: Medical Image-TExt joint adaptive pretRaining with multi-level contrastive learning
2023cited by this paper
A foundation model for generalizable disease detection from retinal images
2023cited by this paper
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
2023cited by this paper
Mapping medical image-text to a joint space via masked modeling
2023cited by this paper
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
2023cited by this paper
Deep learning based synthesis of MRI, CT and PET: Review and analysis
2023cited by this paper
LMKG: A large-scale and multi-source medical knowledge graph for intelligent medicine applications
2023cited by this paper
Large-scale long-tailed disease diagnosis on radiology images
2023cited by this paper
USFM: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis
2023cited by this paper
Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis
2023cited by this paper
Knowledge-enhanced visual-language pre-training on chest radiology images
2023cited by this paper
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking
2023cited by this paper
M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities
2023cited by this paper
Vision-Language Models for Vision Tasks: A Survey
2023cited by this paper
Exploring Effective Factors for Improving Visual In-Context Learning
2023cited by this paper
Foundation models for generalist medical artificial intelligence
2023cited by this paper

CITED BY

No citing papers are available for this paper.