Dynamic policy programming

Published 2010 in Journal of machine learning research

ABSTRACT

In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic l\infty-norm performance-loss bounds for DPP in the presence of approximation/estimation error. The bounds are expressed in terms of the l\infty-norm of the average accumulated error as opposed to the l\infty-norm of the error in the case of the standard approximate value iteration (AVI) and the approximate policy iteration (API). This suggests that DPP can achieve a better performance than AVI and API since it averages out the simulation noise caused by Monte-Carlo sampling throughout the learning process. We examine this theoretical results numerically by com- paring the performance of the approximate variants of DPP with existing reinforcement learning (RL) methods on different problem domains. Our results show that, in all cases, DPP-based algorithms outperform other RL methods by a wide margin.

PUBLICATION RECORD

Publication year
2010
Venue
Journal of machine learning research
Publication date
2010-04-12
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.5555/2503308.2503344 arXiv 1004.2027
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Hierarchical Relative Entropy Policy Search
2014cited by this paper
An information-theoretic approach to curiosity-driven reinforcement learning
2012cited by this paper
Speedy Q-Learning
2011cited by this paper
Advances in Neural Information Processing Systems pp MIT Press Generalization in Reinforcement Learning Successful Examples Using Sparse Coarse Coding
2010cited by this paper
Relative Entropy Policy Search
2010influential reference
Algorithms for Reinforcement Learning
2010cited by this paper
Model-based reinforcement learning with nearly tight exploration complexity bounds
2010cited by this paper
Toward Off-Policy Learning Control with Function Approximation
2010cited by this paper
Error Propagation for Approximate Policy and Value Iteration
2010cited by this paper
Dynamic Programming and Optimal Control 3rd Edition, Volume II
2010cited by this paper
Neuro-Dynamic Programming
2009influential reference
Natural actor-critic algorithms
2009cited by this paper
Reinforcement Learning in Finite MDPs: PAC Analysis
2009cited by this paper
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs
2009cited by this paper
Optimal control as a graphical model inference problem
2009cited by this paper
Model-free reinforcement learning as mixture learning
2009cited by this paper
Finite-Time Bounds for Fitted Value Iteration
2008influential reference
Regularized Policy Iteration
2008cited by this paper
Near-optimal Regret Bounds for Reinforcement Learning
2008cited by this paper
An analysis of reinforcement learning with function approximation
2008cited by this paper
Regularized Fitted Q-iteration: Application to Planning
2008cited by this paper
Dual Representations for Dynamic Programming and Reinforcement Learning
2007cited by this paper
Stable Dual Dynamic Programming
2007cited by this paper
Fitted Q-iteration in continuous action-space MDPs
2007cited by this paper
Prediction, learning, and games
2006cited by this paper
Linearly-solvable Markov decision problems
2006cited by this paper
Path integrals and symmetry breaking for optimal control theory
2005cited by this paper
Tree-Based Batch Mode Reinforcement Learning
2005cited by this paper
Error Bounds for Approximate Value Iteration
2005cited by this paper
Interpolation-based Q-learning
2004cited by this paper
Information Theory, Inference, and Learning Algorithms
2004cited by this paper
Learning Rates for Q-learning
2004influential reference
Natural Actor-Critic
2003cited by this paper
Least-Squares Policy Iteration
2003cited by this paper
Covariant policy search
2003cited by this paper
A Convergent Form of Approximate Policy Iteration
2002cited by this paper
An Introduction to Reinforcement Learning Theory: Value Function Methods
2002cited by this paper
Infinite-Horizon Policy-Gradient Estimation
2001cited by this paper
A Natural Policy Gradient
2001cited by this paper
On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning
2000cited by this paper
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
2000cited by this paper
The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
2000cited by this paper
On the existence of fixed points for approximate value iteration and temporal-difference learning
2000cited by this paper
Policy Gradient Methods for Reinforcement Learning with Function Approximation
1999cited by this paper
Actor-Critic Algorithms
1999influential reference
Reinforcement Learning: An Introduction
1998cited by this paper
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
1998cited by this paper
Introduction to Reinforcement Learning
1998cited by this paper
Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling
1998cited by this paper
The Asymptotic Convergence-Rate of Q-learning
1997cited by this paper
Dynamic Programming and Optimal Control, Two Volume Set
1995cited by this paper
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
1994cited by this paper
Complexity Analysis of Real-Time Reinforcement Learning
1992cited by this paper
Q-learning
1992influential reference
Neuronlike adaptive elements that can solve difficult learning control problems
1983cited by this paper
The Law of Large Numbers and the Central Limit Theorem in Banach Spaces
1976cited by this paper
Dynamic Programming and Optimal Control, Vol. II
1976cited by this paper
Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics
year unknowncited by this paper

CITED BY

Batch Normalized Reinforcement Learning with Relative Entropy Regularization for Superior Efficiency and Robustness
2025cites this paper
Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction
2025cites this paper
Effective Reinforcement Learning with Smooth Policy Update and Informative State Action Representation
2025cites this paper
A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies
2025cites this paper
Efficient Model Based Reinforcement Learning Control Using Relative Entropy Regularization
2025influential citation
MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection
2025cites this paper
Partially Observable Reference Policy Programming: Solving POMDPs Sans Numerical Optimisation
2025cites this paper
Reducing the value function over-estimation by Kullback-Leibler divergence regularized distributional actor-critic
2025influential citation
Effective Reinforcement Learning Control using Conservative Soft Actor-Critic
2025cites this paper
Mirror Descent Actor Critic via Bounded Advantage Learning
2025cites this paper
Deep reinforcement learning based energy management strategies for electrified vehicles: Recent advances and perspectives
2024cites this paper
Enhancing Industrial Process Control: Integrating Intelligent Digital Twin Technology with Proportional-Integral-Derivative Regulators
2024cites this paper
Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling
2024cites this paper
Structure Matters: Dynamic Policy Gradient
2024cites this paper
Offline Reinforcement Learning via Tsallis Regularization
2024cites this paper
Optimality Theory of Stigmergic Collective Information Processing by Chemotactic Cells
2024cites this paper
Sparse randomized policies for Markov decision processes based on Tsallis divergence regularization
2024cites this paper
Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently
2024cites this paper
Reward-Punishment Reinforcement Learning with Maximum Entropy
2024cites this paper
Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization
2023influential citation
Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence
2023influential citation
Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning
2023influential citation
Effective Multi-Agent Deep Reinforcement Learning Control With Relative Entropy Regularization
2023influential citation
General Munchausen Reinforcement Learning with Tsallis Kullback-Leibler Divergence
2023cites this paper
Reinforcement Learning Based Gasoline Blending Optimization: Achieving More Efficient Nonlinear Online Blending of Fuels
2023cites this paper
The Point to Which Soft Actor-Critic Converges
2023cites this paper
Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning
2022cites this paper
Risk-Sensitive Reinforcement Learning With Exponential Criteria
2022cites this paper
Adaptive Tsallis Entropy Regularization for Efficient Reinforcement Learning
2022cites this paper
Learning-based control approaches for service robots on cloth manipulation and dressing assistance: a comprehensive review
2022cites this paper
Dynamic Policy Programming with Descending Regularization for Efficient Reinforcement Learning Control
2022influential citation
Goal-Aware Generative Adversarial Imitation Learning from Imperfect Demonstration for Robotic Cloth Manipulation
2022cites this paper
A survey of inverse reinforcement learning
2022cites this paper
q-Munchausen Reinforcement Learning
2022cites this paper
On linear and super-linear convergence of Natural Policy Gradient algorithm
2022cites this paper
Smoothing Advantage Learning
2022cites this paper
Alleviating parameter-tuning burden in reinforcement learning for large-scale process control
2022cites this paper
Revisiting Peng's Q(λ) for Modern Reinforcement Learning
2021cites this paper
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
2021cites this paper
Robust Entropy-regularized Markov Decision Processes
2021cites this paper
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives
2021influential citation
Finding Near Optimal Policies via Reducive Regularization in Markov Decision Processes
2021cites this paper
Constrained stochastic optimal control with learned importance sampling: A path integral approach
2021cites this paper
Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
2021cites this paper
Binarized P-Network: Deep Reinforcement Learning of Robot Control from Raw Images on FPGA
2021cites this paper
Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning
2021influential citation
Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning
2021influential citation
Cautious Actor-Critic
2021influential citation
Bregman Gradient Policy Optimization
2021cites this paper
Finite-Sample Analysis of Off-Policy Natural Actor–Critic With Linear Function Approximation
2021cites this paper
Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach
2021influential citation
On the Linear Convergence of Natural Policy Gradient Algorithm
2021cites this paper
Build complementary models on human feedback for simulation to the real world
2021cites this paper
Shiftable Dynamic Policy Programming for Efficient and Robust Reinforcement Learning Control
2021influential citation
Imitation learning based on entropy-regularized forward and inverse reinforcement learning
2020influential citation
Stable Policy Optimization via Off-Policy Divergence Regularization
2020cites this paper
Scalable reinforcement learning for plant-wide control of vinyl acetate monomer process
2020cites this paper
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
2020cites this paper
Leverage the Average: an Analysis of Regularization in RL
2020influential citation
Mirror Descent Policy Optimization
2020cites this paper
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
2020cites this paper
Munchausen Reinforcement Learning
2020influential citation
A Relation Analysis of Markov Decision Process Frameworks
2020cites this paper
Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning
2020influential citation
Dynamic Actor-Advisor Programming for Scalable Safe Reinforcement Learning
2020cites this paper
Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs
2020cites this paper
An Empirical Study of Exploration Strategies for Model-Free Reinforcement Learning
2020cites this paper
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
2020cites this paper
Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
2020influential citation
Efficient and Noise-Tolerant Reinforcement Learning Algorithms via Theoretical Analysis of Gap-Increasing and Softmax Operators
2020influential citation
Forward and inverse reinforcement learning sharing network weights and hyperparameters
2020influential citation
Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots
2020influential citation
Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning
2019influential citation
Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies
2019cites this paper
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
2019influential citation
On the Convergence of Approximate and Regularized Policy Iteration Schemes
2019cites this paper
Momentum in Reinforcement Learning
2019cites this paper
Provably Efficient Exploration in Policy Optimization
2019cites this paper
Imitation learning based on entropy-regularized reinforcement learning
2019influential citation
Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes
2019cites this paper
Stochastic Convergence Results for Regularized Actor-Critic Methods
2019cites this paper
Learning from demonstration for locally assistive mobility aids
2019influential citation
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
2019cites this paper
Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning
2019influential citation
Learning-Driven Exploration for Reinforcement Learning
2019cites this paper
Demonstration actor critic
2019cites this paper
Provable Q-Iteration with L inﬁnity Guarantees and Function Approximation
2019cites this paper
A Convergence Result for Regularized Actor-Critic Methods
2019cites this paper
Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation
2019influential citation
A Theory of Regularized Markov Decision Processes
2019influential citation
Factorial Kernel Dynamic Policy Programming for Vinyl Acetate Monomer Plant Model Control
2018cites this paper
Applications of variable discounting dynamic programming to iterated function systems and related problems
2018cites this paper
Increased task perception for adaptable human-robot collaboration
2018cites this paper
Exponentially Weighted Imitation Learning for Batched Historical Data
2018cites this paper
Variational Bayesian Reinforcement Learning with Regret Bounds
2018cites this paper
A Constrained Randomized Shortest-Paths Framework for Optimal Exploration
2018influential citation
Path Consistency Learning in Tsallis Entropy Regularized MDPs
2018cites this paper
Bridging the Gap Between Value and Policy Based Reinforcement Learning
2017influential citation
Local driving assistance from demonstration for mobility aids
2017influential citation
Deep dynamic policy programming for robot control with raw images
2017cites this paper