Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Vladimir Feinberg,Alvin Wan,Ion Stoica,Michael I. Jordan,Joseph E. Gonzalez,S. Levine

Published 2018 in arXiv.org

ABSTRACT

Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit usage of the dynamics model. We present model-based value expansion, which controls for uncertainty in the model by only allowing imagination to fixed depth. By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.

PUBLICATION RECORD

Publication year
2018
Venue
arXiv.org
Publication date
2018-02-28
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1803.00101
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Model-Ensemble Trust-Region Policy Optimization
2018influential reference
On the Sample Complexity of the Linear Quadratic Regulator
2017cited by this paper
Value Prediction Network
2017cited by this paper
Parameter Space Noise for Exploration
2017cited by this paper
Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning
2017influential reference
Imagination-Augmented Agents for Deep Reinforcement Learning
2017cited by this paper
Bridging the Gap Between Value and Policy Based Reinforcement Learning
2017cited by this paper
Continuous Deep Q-Learning with Model-based Acceleration
2016influential reference
Learning Continuous Control Policies by Stochastic Value Gradients
2015influential reference
Continuous control with deep reinforcement learning
2015influential reference
Deterministic Policy Gradient Algorithms
2014influential reference
MuJoCo: A physics engine for model-based control
2012cited by this paper
Off‐Policy Actor‐Criticアルゴリズムによる強化学習
2004influential reference
Incremental multi-step Q-learning
1994cited by this paper

CITED BY

The Surprising Difficulty of Search in Model-Based Reinforcement Learning
2026cites this paper
Generalized policy improvement for efficient and robust multi-objective reinforcement learning
2026cites this paper
Laplacian Representations for Decision-Time Planning
2026cites this paper
Double Horizon Model-Based Policy Optimization
2025influential citation
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
2025cites this paper
Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning
2025cites this paper
Integration of Planning and Deep Reinforcement Learning in Speed and Lane Change Decision-Making for Highway Autonomous Driving
2025cites this paper
Stochastic Path Planning in Correlated Obstacle Fields
2025cites this paper
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
2025cites this paper
Data-efficient reinforcement learning by generalized value estimation
2025cites this paper
First Order Model-Based RL through Decoupled Backpropagation
2025cites this paper
Scalable Offline Model-Based RL with Action Chunks
2025influential citation
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions
2025cites this paper
Enhancing model learning in reinforcement learning through Q-function-guided trajectory alignment
2025cites this paper
MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning
2025cites this paper
Multi-Agent Reinforcement Learning for Distributed Cooperative Vehicular Positioning
2025cites this paper
From theory to application: investigating the generalizability of facility layout problems using a deep reinforcement learning approach
2025cites this paper
EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks
2025cites this paper
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization
2025cites this paper
Reinforcement Learning for Process Control: Review and Benchmark Problems
2025cites this paper
A KL-regularization framework for learning to plan with adaptive priors
2025cites this paper
EvoAgent: Self-evolving Agent with Continual World Model for Long-Horizon Tasks
2025cites this paper
Implicit Search via Discrete Diffusion: A Study on Chess
2025cites this paper
Generative Models in Decision Making: A Survey
2025cites this paper
A limitation on black-box dynamics approaches to Reinforcement Learning
2025cites this paper
On Rollouts in Model-Based Reinforcement Learning
2025cites this paper
Bounded Active Exploration for Model-Based Reinforcement Learning
2025cites this paper
Artificial intelligent focusing of a microbeam system based on reinforcement learning
2025cites this paper
Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing
2025cites this paper
Learn A Flexible Exploration Model for Parameterized Action Markov Decision Processes
2025cites this paper
Multistate Temporal Difference Target for Model-Free Reinforcement Learning
2025cites this paper
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
2025cites this paper
Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning
2025cites this paper
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
2024cites this paper
Diminishing Return of Value Expansion Methods
2024influential citation
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning
2024cites this paper
Cooperative Positioning with Multi-Agent Reinforcement Learning
2024cites this paper
Model-based Policy Optimization under Approximate Bayesian Inference
2024cites this paper
Diffusion World Model
2024cites this paper
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning
2024cites this paper
Multi-agent dual actor-critic framework for reinforcement learning navigation
2024cites this paper
PWM: Policy Learning with Multi-Task World Models
2024cites this paper
A planar tracking strategy based on multiple-interpretable improved PPO algorithm with few-shot technique
2024cites this paper
Research on Intelligent Signal Timing Optimization of Signalized Intersection Based on Deep Reinforcement Learning Using Floating Car Data
2024cites this paper
Learning-based legged locomotion; state of the art and future perspectives
2024cites this paper
Hindsight PRIORs for Reward Learning from Human Preferences
2024cites this paper
Decentralized multi-agent cooperation via adaptive partner modeling
2024cites this paper
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
2024cites this paper
Mitigating Distribution Shift in Model-based Offline RL via Shifts-aware Reward Learning
2024cites this paper
Grounded Answers for Multi-agent Decision-making Problem through Generative World Model
2024cites this paper
Multi-State TD Target for Model-Free Reinforcement Learning
2024cites this paper
Prioritized Generative Replay
2024cites this paper
Zero-shot Model-based Reinforcement Learning using Large Language Models
2024cites this paper
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
2024cites this paper
Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration
2024cites this paper
Trust the Model Where It Trusts Itself - Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption
2024cites this paper
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
2024cites this paper
Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation
2024influential citation
Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning
2024cites this paper
Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning
2024cites this paper
Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning
2024cites this paper
Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models
2024cites this paper
Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, Its Applications, and Future Research Trajectories
2024cites this paper
Umbrella Reinforcement Learning - A Tool for Hard Problems
2024cites this paper
Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning
2024influential citation
Parallelizing Model-based Reinforcement Learning Over the Sequence Length
2024cites this paper
Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey
2024cites this paper
Machine Learning Meets Advanced Robotic Manipulation
2023influential citation
Safe Reinforcement Learning for Model-Reference Trajectory Tracking of Uncertain Autonomous Vehicles With Model-Based Acceleration
2023cites this paper
How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization
2023cites this paper
Dynamically Conservative Self-Driving Planner for Long-Tail Cases
2023cites this paper
Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout
2023cites this paper
A Goal-Conditioned Reinforcement Learning Algorithm with Environment Modeling
2023cites this paper
Policy Resilience to Environment Poisoning Attacks on Reinforcement Learning
2023influential citation
Reinforcement Learning in the Sky: A Survey on Enabling Intelligence in NTN-Based Communications
2023cites this paper
Reinforcement Learning with Goal Relabeling and Dynamic Model for Robotic Tasks
2023cites this paper
Efficiency Separation between RL Methods: Model-Free, Model-Based and Goal-Conditioned
2023cites this paper
Reinforcement Learning with Neural Network-based Deterministic Game Tree Approximation
2023cites this paper
A Review on Robot Manipulation Methods in Human-Robot Interactions
2023cites this paper
Q-Learning for Sum-Throughput Optimization in Wireless Visible-Light UAV Networks
2023cites this paper
Rethinking Closed-Loop Training for Autonomous Driving
2023cites this paper
Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics
2023cites this paper
On Representation Complexity of Model-based and Model-free Reinforcement Learning
2023cites this paper
Adaptation Augmented Model-based Policy Optimization
2023cites this paper
Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning
2023cites this paper
Multi-Step Hindsight Experience Replay with Bias Reduction for Efficient Multi-Goal Reinforcement Learning
2023cites this paper
Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
2023influential citation
Reinforcement learning for swarm robotics: An overview of applications, algorithms and simulators
2023cites this paper
MOTO: Offline to Online Fine-tuning for Model-Based Reinforcement Learning
2023cites this paper
A Long N-step Surrogate Stage Reward for Deep Reinforcement Learning
2023cites this paper
MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards
2023cites this paper
Multi-Step Generalized Policy Improvement by Leveraging Approximate Models
2023cites this paper
Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay
2023cites this paper
Offline Model-Based Reinforcement Learning for Tokamak Control
2023cites this paper
Uncertainty-Aware Model-Based Reinforcement Learning: Methodology and Application in Autonomous Driving
2023cites this paper
Simplified Temporal Consistency Reinforcement Learning
2023influential citation
$\lambda$-models: Effective Decision-Aware Reinforcement Learning with Latent Models
2023cites this paper
Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control
2023cites this paper
A Data Enhancement Strategy for Multi-Agent Cooperative Hunting based on Deep Reinforcement Learning
2023cites this paper
Model-based Adversarial Imitation Learning from Demonstrations and Human Reward
2023cites this paper