Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Ruizhe Shi,Yuyao Liu,Yanjie Ze,Simon S. Du,Huazhe Xu

Published 2023 in International Conference on Learning Representations

ABSTRACT

Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate $\textbf{LaMo}$ achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Our project website is https://lamo2023.github.io

PUBLICATION RECORD

Publication year
2023
Venue
International Conference on Learning Representations
Publication date
2023-10-31
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2310.20587 arXiv 2310.20587
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

LLaMA: Open and Efficient Foundation Language Models
2023cited by this paper
Prompt a Robot to Walk with Large Language Models
2023cited by this paper
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
2023cited by this paper
SayTap: Language to Quadrupedal Locomotion
2023cited by this paper
Emergent Agentic Transformer from Chain of Hindsight Experience
2023cited by this paper
Future-conditioned Unsupervised Pretraining for Decision Transformer
2023cited by this paper
READ: Recurrent Adaptation of Large Transformers
2023cited by this paper
When should we prefer Decision Transformers for Offline Reinforcement Learning?
2023cited by this paper
Revisiting the Minimalist Approach to Offline Reinforcement Learning
2023cited by this paper
Prompt-Tuning Decision Transformer with Preference Ranking
2023cited by this paper
TidyBot: Personalized Robot Assistance with Large Language Models
2023cited by this paper
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
2023cited by this paper
Text2Motion: from natural language instructions to feasible plans
2023cited by this paper
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
2023cited by this paper
GPT-4 Technical Report
2023influential reference
Decision Transformer under Random Frame Dropping
2023cited by this paper
PaLM-E: An Embodied Multimodal Language Model
2023cited by this paper
Prompting Decision Transformer for Few-Shot Policy Generalization
2022cited by this paper
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
2022cited by this paper
Can Wikipedia Help Offline Reinforcement Learning?
2022cited by this paper
Training language models to follow instructions with human feedback
2022influential reference
Pre-Trained Language Models for Interactive Decision-Making
2022influential reference
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
2022cited by this paper
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
2022cited by this paper
A Generalist Agent
2022influential reference
Large Language Models are Zero-Shot Reasoners
2022influential reference
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
2022cited by this paper
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
2022cited by this paper
CORL: Research-oriented Deep Offline Reinforcement Learning Library
2022cited by this paper
Visual Reinforcement Learning With Self-Supervised 3D Representations
2022cited by this paper
Scaling Instruction-Finetuned Language Models
2022cited by this paper
In-context Reinforcement Learning with Algorithm Distillation
2022cited by this paper
On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
2022cited by this paper
RT-1: Robotics Transformer for Real-World Control at Scale
2022cited by this paper
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2021cited by this paper
d3rlpy: An Offline Deep Reinforcement Learning Library
2021cited by this paper
Offline Reinforcement Learning with Implicit Q-Learning
2021influential reference
LoRA: Low-Rank Adaptation of Large Language Models
2021influential reference
A Minimalist Approach to Offline Reinforcement Learning
2021influential reference
Offline Reinforcement Learning as One Big Sequence Modeling Problem
2021cited by this paper
Decision Transformer: Reinforcement Learning via Sequence Modeling
2021influential reference
The Power of Scale for Parameter-Efficient Prompt Tuning
2021cited by this paper
Transformers in Vision: A Survey
2021cited by this paper
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
2020cited by this paper
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
2020influential reference
Language Models are Few-Shot Learners
2020influential reference
Conservative Q-Learning for Offline Reinforcement Learning
2020influential reference
Generative Pretraining From Pixels
2020cited by this paper
Mastering Atari with Discrete World Models
2020cited by this paper
Training data-efficient image transformers & distillation through attention
2020cited by this paper
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
2020cited by this paper
Reservoir Transformers
2020cited by this paper
Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning
2019influential reference
Language Models are Unsupervised Multitask Learners
2019influential reference
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Improving Language Understanding by Generative Pre-Training
2018influential reference
Off-Policy Deep Reinforcement Learning without Exploration
2018cited by this paper
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
2018cited by this paper
Attention is All you Need
2017cited by this paper
Proximal Policy Optimization Algorithms
2017cited by this paper
Gaussian Error Linear Units (GELUs)
2016cited by this paper
Pointer Sentinel Mixture Models
2016influential reference
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
MuJoCo: A physics engine for model-based control
2012influential reference
The Arcade Learning Environment: An Evaluation Platform for General Agents
2012influential reference
What is the best multi-stage architecture for object recognition?
2009cited by this paper
Language Models
2009cited by this paper
Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
2005cited by this paper

CITED BY

MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
2026cites this paper
Prompt Tuning Decision Transformers with Structured and Scalable Bandits
2025cites this paper
K-Bloom: unleashing the power of pre-trained language models in extracting knowledge graph with predefined relations
2025cites this paper
Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching
2025cites this paper
Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision
2025cites this paper
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
2025cites this paper
Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking
2025cites this paper
Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
2025cites this paper
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning
2025cites this paper
Agentic Episodic Control
2025cites this paper
Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends
2025cites this paper
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
2025cites this paper
Large Model Empowered Embodied AI: A Survey on Decision-Making and Embodied Learning
2025cites this paper
M-SAT: Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Actions
2025cites this paper
Reward Design for Reinforcement Learning in the Development of Large Language Models
2025cites this paper
Embodied Intelligence for Flexible Manufacturing: A Survey
2025cites this paper
Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving
2024cites this paper
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
2024cites this paper
Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models
2024cites this paper
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
2024cites this paper
Decomposed Prompt Decision Transformer for Efficient Unseen Task Generalization
2024cites this paper
RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents
2024cites this paper
Decision Transformer as a Foundation Model for Partially Observable Continuous Control
2024influential citation
A Survey of Language-Based Communication in Robotics
2024cites this paper
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
2024cites this paper
Grounding Multimodal Large Language Models in Actions
2024cites this paper
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
2024cites this paper
Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
2024cites this paper
iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement
2024cites this paper
Multi-Task Reinforcement Learning with Cost-based HTN Planning
2024cites this paper
Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer
2024cites this paper
TinyVLA: Toward Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
2024cites this paper
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration
2024cites this paper
Integrating Reinforcement Learning and Large Language Models for Crop Production Process Management Optimization and Control through A New Knowledge-Based Deep Learning Paradigm
2024cites this paper
Zero-shot Model-based Reinforcement Learning using Large Language Models
2024cites this paper
Learning from models beyond fine-tuning
2023cites this paper