Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction

Angela L. Vuong,Minh-Hao Van,Prateek Verma,Chen Zhao,Xintao Wu

Published 2025 in BigData Congress [Services Society]

ABSTRACT

Vision-Language Models (VLMs) have shown strong performance in tasks like visual question answering and multimodal text generation, but their effectiveness in scientific domains such as materials science remains limited. While some machine learning methods have addressed specific challenges in this field, there is still a lack of foundation models designed for broad tasks like polymer property prediction using multimodal data. In this work, we present a multimodal polymer dataset to fine-tune VLMs through instruction-tuning pairs and assess the impact of multimodality on prediction performance. Our fine-tuned models, using LoRA, outperform unimodal and baseline approaches, demonstrating the benefits of multimodal learning. Additionally, this approach reduces the need to train separate models for different properties, lowering deployment and maintenance costs.

PUBLICATION RECORD

Publication year
2025
Venue
BigData Congress [Services Society]
Publication date
2025-11-04
Fields of study
Physics, Materials Science, Computer Science
Identifiers
DOI 10.1109/BigData66926.2025.11402116 arXiv 2511.05577
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Qwen2.5-VL Technical Report
2025cited by this paper
Vision-BioLLM: Large vision language model for visual dialogue in biomedical imagery
2025cited by this paper
Benchmarking Large Language Models for Polymer Property Predictions
2025cited by this paper
Multimodal machine learning with large language embedding model for polymer property prediction
2025cited by this paper
Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules
2024cited by this paper
Predicting Polymer Properties Based on Multimodal Multitask Pretraining
2024cited by this paper
The Llama 3 Herd of Models
2024cited by this paper
DoRA: Weight-Decomposed Low-Rank Adaptation
2024cited by this paper
Sigmoid Loss for Language Image Pre-Training
2023cited by this paper
GPT-4 Technical Report
2023cited by this paper
Visual Instruction Tuning
2023cited by this paper
QLoRA: Efficient Finetuning of Quantized LLMs
2023cited by this paper
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
2023cited by this paper
Uni-Mol: A Universal 3D Molecular Representation Learning Framework
2023cited by this paper
SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions
2023cited by this paper
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
2023cited by this paper
Qwen Technical Report
2023cited by this paper
VeRA: Vector-based Random Matrix Adaptation
2023cited by this paper
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
2023cited by this paper
Flamingo: a Visual Language Model for Few-Shot Learning
2022cited by this paper
Images of chemical structures as molecular representations for deep learning
2022cited by this paper
polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
2022cited by this paper
Chain of Thought Prompting Elicits Reasoning in Large Language Models
2022cited by this paper
RadonPy: automated physical property calculation using all-atom classical molecular dynamics simulations for polymer informatics
2022cited by this paper
SELFIES and the future of molecular string representations
2022cited by this paper
Large Language Models are Zero-Shot Reasoners
2022cited by this paper
LoRA: Low-Rank Adaptation of Large Language Models
2021influential reference
Learning Transferable Visual Models From Natural Language Supervision
2021cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
Machine-learning predictions of polymer properties with Polymer Genome
2020cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Polymer Genome: A Data-Powered Polymer Informatics Platform for Property Predictions
2018cited by this paper
Attention is All you Need
2017cited by this paper
Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models
2017cited by this paper
A polymer dataset for accelerated property prediction and design
2016cited by this paper
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
2012cited by this paper
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
1988cited by this paper
The Claude 3 Model Family: Opus, Sonnet, Haiku
year unknowncited by this paper

CITED BY

No citing papers are available for this paper.