The Value Equivalence Principle for Model-Based Reinforcement Learning

Christopher Grimm,André Barreto,Satinder Singh,David Silver

Published 2020 in Neural Information Processing Systems

ABSTRACT

Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning. As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates. We propose a formulation of the model learning problem based on the value equivalence principle and analyze how the set of feasible solutions is impacted by the choice of policies and functions. Specifically, we show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks, until eventually collapsing to a single point corresponding to a model that perfectly describes the environment. In many problems, directly modelling state-to-state transitions may be both difficult and unnecessary. By leveraging the value-equivalence principle one may find simpler models without compromising performance, saving computation and memory. We illustrate the benefits of value-equivalent model learning with experiments comparing it against more traditional counterparts like maximum likelihood estimation. More generally, we argue that the principle of value equivalence underlies a number of recent empirical successes in RL, such as Value Iteration Networks, the Predictron, Value Prediction Networks, TreeQN, and MuZero, and provides a first theoretical underpinning of those results.

PUBLICATION RECORD

Publication year
2020
Venue
Neural Information Processing Systems
Publication date
2020-11-06
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2011.03506
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Plannable Approximations to MDP Homomorphisms: Equivariance under Actions
2020cited by this paper
Model-Based Reinforcement Learning with Value-Targeted Regression
2020cited by this paper
The Value-Improvement Path: Towards Better Representations for Reinforcement Learning
2020cited by this paper
Learning discrete state abstractions with deep variational inference
2020cited by this paper
Policy-Aware Model Learning for Policy Gradient Methods
2020cited by this paper
The Value Function Polytope in Reinforcement Learning
2019cited by this paper
Scalable methods for computing state similarity in deterministic Markov Decision Processes
2019cited by this paper
Mastering Atari, Go, chess and shogi by planning with a learned model
2019influential reference
Learning Causal State Representations of Partially Observable Environments
2019cited by this paper
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
2019cited by this paper
A Geometric Perspective on Optimal Representations for Reinforcement Learning
2019cited by this paper
Iterative Value-Aware Model Learning
2018influential reference
Deep Variational Reinforcement Learning for POMDPs
2018cited by this paper
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
2018cited by this paper
Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning
2018cited by this paper
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
2018cited by this paper
Combined Reinforcement Learning via Abstract Representations
2018cited by this paper
Value Prediction Network
2017cited by this paper
Value-Aware Loss Function for Model-based Reinforcement Learning
2017influential reference
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
2017influential reference
Value Iteration Networks
2016influential reference
The Predictron: End-To-End Learning and Planning
2016influential reference
Value-Aware Loss Function for Model Learning in Reinforcement Learning
2016cited by this paper
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
2015cited by this paper
Recurrent Models of Visual Attention
2014cited by this paper
Reinforcement learning with misspecified model classes
2013cited by this paper
Algorithms for Reinforcement Learning
2010cited by this paper
Neuro-Dynamic Programming
2009cited by this paper
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
2008cited by this paper
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
2008cited by this paper
Bounding Performance Loss in Approximate MDP Homomorphisms
2008cited by this paper
Towards a Unified Theory of State Abstraction for MDPs
2006cited by this paper
Metrics for Finite Markov Decision Processes
2004cited by this paper
Equivalence notions and model minimization in Markov decision processes
2003cited by this paper
Value-Directed Compression of POMDPs
2002cited by this paper
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
1999cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Model Minimization in Markov Decision Processes
1997cited by this paper
STUART RUSSELL AND PETER NORVIG, ARTIFICIAL INTELLIGENCE: A MODERN APPROACH
1996cited by this paper
Markov Decision Processes: Discrete Stochastic Dynamic Programming
1994cited by this paper
Learning to predict by the methods of temporal differences
1988cited by this paper
Neuronlike adaptive elements that can solve difficult learning control problems
1983cited by this paper
Uncertainty in Artificial Intelligence Proceedings 2000 Value-directed Belief State Approximation for Pomdps
year unknowncited by this paper
Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes
year unknowncited by this paper
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
year unknowncited by this paper

CITED BY

On the Role of Iterative Computation in Reinforcement Learning
2026cites this paper
Calibrated Value-Aware Model Learning with Stochastic Environment Models
2025cites this paper
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning
2025cites this paper
Calibrated Value-Aware Model Learning with Probabilistic Environment Models
2025cites this paper
EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
2025cites this paper
Closing the Sim2Real Performance Gap in RL
2025cites this paper
Perspectives on optimizing transport systems with supply-dependent demand
2025cites this paper
Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning
2024cites this paper
Learning Abstract World Model for Value-preserving Planning with Options
2024cites this paper
Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning
2024cites this paper
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
2024cites this paper
Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models
2024cites this paper
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
2024cites this paper
ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze
2024cites this paper
Feasibility Consistent Representation Learning for Safe Reinforcement Learning
2024cites this paper
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning
2024cites this paper
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
2024cites this paper
Bridging State and History Representations: Understanding Self-Predictive RL
2024cites this paper
Skill or Luck? Return Decomposition via Advantage Functions
2024cites this paper
Policy-shaped prediction: avoiding distractions in model-based reinforcement learning
2024cites this paper
Decision-Focused Model-based Reinforcement Learning for Reward Transfer
2024cites this paper
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
2024cites this paper
An Attentive Approach for Building Partial Reasoning Agents from Pixels
2024cites this paper
TaCoD: Tasks-Commonality-Aware World in Meta Reinforcement Learning
2024cites this paper
Research on operations and maintenance behavior audit based on drl
2024cites this paper
HarmonyDream: Task Harmonization Inside World Models
2023cites this paper
Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning
2023cites this paper
Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function
2023cites this paper
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
2023cites this paper
Decision-Focused Model-based Reinforcement Learning for Reward Transfer
2023influential citation
A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making
2023cites this paper
Bayesian Reinforcement Learning With Limited Cognitive Load
2023cites this paper
Policy Gradient Methods in the Presence of Symmetries and State Abstractions
2023cites this paper
TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching
2023cites this paper
Query-Policy Misalignment in Preference-Based Reinforcement Learning
2023cites this paper
What model does MuZero learn?
2023cites this paper
Deep Generative Models for Decision-Making and Control
2023cites this paper
The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning
2023cites this paper
A Survey of Contextual Optimization Methods for Decision Making under Uncertainty
2023cites this paper
$\lambda$-models: Effective Decision-Aware Reinforcement Learning with Latent Models
2023cites this paper
Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning
2023influential citation
The Principle of Value Equivalence for Policy Gradient Search
2023influential citation
A Bayesian Approach to Robust Inverse Reinforcement Learning
2023cites this paper
Leveraging Value-awareness for Online and Offline Model-based Reinforcement Learning
2023influential citation
A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning
2023influential citation
Pixel State Value Network for Combined Prediction and Planning in Interactive Environments
2023cites this paper
Model gradient: unified model and policy learning in model-based reinforcement learning
2023cites this paper
Task-aware world model learning with meta weighting via bi-level optimization
2023cites this paper
Harmony World Models: Boosting Sample Efficiency for Model-based Reinforcement Learning
2023cites this paper
Planning in Stochastic Environments with a Learned Model
2022cites this paper
Planning with Theory of Mind.
2022cites this paper
Continuous MDP Homomorphisms and Homomorphic Policy Gradient
2022cites this paper
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
2022cites this paper
Efficient Offline Policy Optimization with a Learned Model
2022cites this paper
Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation
2022cites this paper
Integrating Symmetry into Differentiable Planning
2022cites this paper
Transfer RL across Observation Feature Spaces via Model-Based Regularization
2022cites this paper
V ALUE G RADIENT WEIGHTED M ODEL -B ASED R EINFORCEMENT L EARNING
2022cites this paper
Approximate Value Equivalence
2022influential citation
Operator Splitting Value Iteration
2022cites this paper
Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels
2022cites this paper
Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning
2022influential citation
On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning
2022cites this paper
A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings
2022cites this paper
Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning
2022influential citation
Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning
2022influential citation
Should Models Be Accurate?
2022cites this paper
Value Gradient weighted Model-Based Reinforcement Learning
2022cites this paper
VIPer: Iterative Value-Aware Model Learning on the Value Improvement Path
2022cites this paper
Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
2022cites this paper
Model-Advantage Optimization for Model-Based Reinforcement Learning
2021cites this paper
Reproducibility study of the Value Equivalence principle for Model Based Reinforcement Learning
2021cites this paper
Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation
2021influential citation
Parameter-free Gradient Temporal Difference Learning
2021cites this paper
Muesli: Combining Improvements in Policy Optimization
2021influential citation
Reinforcement Learning, Bit by Bit
2021cites this paper
Visualizing MuZero Models
2021influential citation
Procedural Generalization by Planning with Self-Supervised World Models
2021influential citation
Towards Robust Bisimulation Metric Learning
2021cites this paper
Self-Consistent Models and Values
2021cites this paper
Mismatched No More: Joint Model-Policy Optimization for Model-Based RL
2021cites this paper
High-accuracy model-based reinforcement learning, a survey
2021cites this paper
Proper Value Equivalence
2021influential citation
Model-Advantage and Value-Aware Models for Model-Based Reinforcement Learning: Bridging the Gap in Theory and Practice
2021influential citation
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
2021cites this paper
On the role of planning in model-based deep reinforcement learning
2020cites this paper
Towards Continual Reinforcement Learning: A Review and Perspectives
2020cites this paper
Model-based Reinforcement Learning: A Survey
2020cites this paper
Optimistic Risk-Aware Model-based Reinforcement Learning
year unknowncites this paper
Robust Inverse Reinforcement Learning Through Bayesian Theory of Mind
year unknowncites this paper
Policy-shaped prediction: improving world modeling through interpretability
year unknowncites this paper