Med-VLM: Enhancing Medical Image Segmentation Accuracy Through Vision-Language Model

Yihao Zhao,Enhao Zhong,Cuiyun Yuan,Yang Li,Man Zhao,Chunxia Li,Jun Hu,Wei Liu,Chenbin Liu

Published 2025 in 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

ABSTRACT

We proposed Med-VLM (Medical Vision-language Model), an innovative approach that leverages textual descriptions of organs to enhance segmentation accuracy in medical images. Existing medical image segmentation methods face several challenges: (1) Current medical segmentation models often fail to effectively incorporate valuable prior knowledge, such as detailed descriptions of organ locations and characteristics. (2) Most text-visual models prioritize target identification, rather than focusing on enhancing overall accuracy. (3) While some approaches attempt to use prior knowledge for accuracy enhancement, they often fall short in effectively incorporating pre-trained models. To overcome these limitations, Med-VLM introduced several key innovations: low-rank adaptation, authoritative descriptions, BioBERT weights, and a feature mixer. We conducted a comprehensive evaluation of MedVLM using three authoritative medical image datasets, covering the segmentation of various human body parts. Our method demonstrated superior performance compared to existing state-of-the-art approaches, including Lvit, Med-SAM, SAM, and nnUnet. We designed a series of ablation experiments, which systematically assessed the contribution of each component of Med-VLM, providing insights into the model's performance characteristics.

PUBLICATION RECORD

Publication year
2025
Venue
2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Publication date
2025-10-19
Fields of study
Not labeled
Identifiers
DOI 10.1109/ICCVW69036.2025.00756
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Language Guided Domain Generalized Medical Image Segmentation
2024cited by this paper
Diffusion model-based text-guided enhancement network for medical image segmentation
2024cited by this paper
One class classification-based quality assurance of organs-at-risk delineation in radiotherapy
2024cited by this paper
The Llama 3 Herd of Models
2024cited by this paper
EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model
2023cited by this paper
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
2023cited by this paper
Segment Anything
2023influential reference
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
2023cited by this paper
CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
2023cited by this paper
CogVLM: Visual Expert for Pretrained Language Models
2023influential reference
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
2023cited by this paper
Text-Guided Cross-Position Attention for Segmentation: Case of Medical Image
2023cited by this paper
Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge
2023cited by this paper
Segment anything in medical images
2023influential reference
Llama 2: Open Foundation and Fine-Tuned Chat Models
2023cited by this paper
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
2022cited by this paper
Metrics reloaded: recommendations for image analysis validation
2022cited by this paper
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
2022cited by this paper
LViT: Language Meets Vision Transformer in Medical Image Segmentation
2022cited by this paper
Zero-Shot Text-to-Image Generation
2021cited by this paper
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
2021cited by this paper
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
2021cited by this paper
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
2021cited by this paper
LoRA: Low-Rank Adaptation of Large Language Models
2021cited by this paper
The Medical Segmentation Decathlon
2021cited by this paper
The Power of Scale for Parameter-Efficient Prompt Tuning
2021cited by this paper
M6: A Chinese Multimodal Pretrainer
2021cited by this paper
Learning Transferable Visual Models From Natural Language Supervision
2021influential reference
End-to-End Object Detection with Transformers
2020influential reference
Multi-task learning for the segmentation of organs at risk with label dependence
2020cited by this paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
2020influential reference
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation
2020cited by this paper
CenterNet: Keypoint Triplets for Object Detection
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Anatomy, Thorax, Mediastinum Superior and Great Vessels
2019cited by this paper
Anatomy, Abdomen and Pelvis, Prostate
2019cited by this paper
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019influential reference
Parameter-Efficient Transfer Learning for NLP
2019cited by this paper
VisualBERT: A Simple and Performant Baseline for Vision and Language
2019cited by this paper
Language Models are Unsupervised Multitask Learners
2019cited by this paper
The hepatoduodenal ligament revisited: cross-sectional imaging spectrum of non-neoplastic conditions
2018cited by this paper
CollaboNet: collaboration of deep neural networks for biomedical named entity recognition
2018cited by this paper
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
2018cited by this paper
Attention is All you Need
2017cited by this paper
Deep learning with word embeddings improves biomedical named entity recognition
2017cited by this paper
VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images
2017cited by this paper
Mask R-CNN
2017cited by this paper
3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation
2016cited by this paper
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
2016cited by this paper
The Gastrointestinal Circulation: Physiology and Pathophysiology.
2015cited by this paper
SSD: Single Shot MultiBox Detector
2015cited by this paper
You Only Look Once: Unified, Real-Time Object Detection
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Atlas of surgical techniques in trauma
2015cited by this paper
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
2015cited by this paper
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2015cited by this paper
U-Net: Convolutional Networks for Biomedical Image Segmentation
2015cited by this paper
The anatomy of the aging aorta
2014cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Comparison and Evaluation of Methods for Liver Segmentation From CT Datasets
2009cited by this paper
Gray's Anatomy: The Anatomical Basis of Clinical Practice
2005cited by this paper
Approximation theory of the MLP model in neural networks
1999cited by this paper
Ordination by Resemblance Matrices
1978cited by this paper

CITED BY

No citing papers are available for this paper.