The divergence of reinforcement learning algorithms with value-iteration and function approximation

Published 2011 in IEEE International Joint Conference on Neural Network

ABSTRACT

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a “value iteration” scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.

PUBLICATION RECORD

Publication year
2011
Venue
IEEE International Joint Conference on Neural Network
Publication date
2011-07-22
Fields of study
Computer Science
Identifiers
DOI 10.1109/IJCNN.2012.6252792 arXiv 1107.4606
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Value-gradient learning
2012cited by this paper
A comparison of learning speed and ability to cope without exploration between DHP and TD(0)
2012cited by this paper
The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning
2011influential reference
Adaptive Dynamic Programming: An Introduction
2009cited by this paper
Fast gradient-descent methods for temporal-difference learning with linear function approximation
2009cited by this paper
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
2009cited by this paper
Reinforcement Learning by Value Gradients
2008influential reference
ModelBased Adaptive Critic Designs
2004influential reference
Policy Gradient Methods for Reinforcement Learning with Function Approximation
1999influential reference
An Analysis of Temporal-Difference Learning with Function Approximation
1998cited by this paper
Stable adaptive control using new critic designs
1998influential reference
Reinforcement Learning: An Introduction
1998influential reference
An Analysis of Temporal-Di erence Learning with Function Approximation
1996cited by this paper
Feature-based methods for large scale dynamic programming
1995cited by this paper
Residual Algorithms: Reinforcement Learning with Function Approximation
1995cited by this paper
Learning from delayed rewards
1995influential reference
On-line Q-learning using connectionist systems
1994influential reference
Learning to predict by the methods of temporal differences
1988influential reference
Dynamic Programming
1957cited by this paper

CITED BY

Neurocontrol for fixed-length trajectories in environments with soft barriers
2024cites this paper
Pessimistic Iterative Planning with RNNs for Robust POMDPs
2024cites this paper
Factored Online Planning in Many-Agent POMDPs
2023cites this paper
A note on stabilizing reinforcement learning
2021cites this paper
Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning
2021cites this paper
Método de error de Bellman con ponderación de volumen para mallado adaptativo en programación dinámica aproximada
2021cites this paper
A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning
2021cites this paper
Playing First-Person Perspective Games with Deep Reinforcement Learning Using the State-of-the-Art Game-AI Research Platforms
2021cites this paper
Actor vs Critic: Learning the Policy or Learning the Value
2021cites this paper
Playing first-person shooter games with machine learning techniques and methods using the VizDoom Game-AI research platform
2020cites this paper
Fitted Q-Function Control Methodology Based on Takagi–Sugeno Systems
2020cites this paper
Inﬂuence-aware Memory for Deep RL in POMDPs
2020cites this paper
Proximal Policy Optimization with Explicit Intrinsic Motivation
2020cites this paper
Influence-aware Memory for Deep Reinforcement Learning
2019influential citation
Reinforcement Learning Algorithms in Global Path Planning for Mobile Robot
2019cites this paper
Real-world Video Adaptation with Reinforcement Learning
2019cites this paper
FROM OPTIMIZATION TO EQUILIBRATION: UNDERSTANDING AN EMERGING PARADIGM IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
2019cites this paper
Deep Reinforcement Learning for Process Control: A Primer for Beginners
2019cites this paper
Playing a FPS Doom Video Game with Deep Visual Reinforcement Learning
2019cites this paper
Influence-Based Abstraction in Deep Reinforcement Learning
2019cites this paper
Multi-agent query reformulation: Challenges and the role of diversity
2019cites this paper
Metodología de programación dinámica aproximada para control óptimo basada en datos
2019cites this paper
Learning Representations and Agents for Information Retrieval
2019cites this paper
Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation
2018cites this paper
Back to optimality: a formal framework to express the dynamics of learning optimal behavior
2015influential citation
An Adaptive Recurrent Neural-Network Controller using a Stabilization Matrix and Predictive Inputs to Solve a Tracking Problem under Disturbances , II
2014cites this paper
The Misbehavior of Reinforcement Learning
2014cites this paper
Theoretical and Numerical Analysis of Approximate Dynamic Programming with Approximation Errors
2014cites this paper
What Have Computational Models Ever Done for Us?: A Case Study in Classical Conditioning
2014cites this paper
Convergence of a Q-learning Variant for Continuous States and Actions
2014cites this paper
An adaptive recurrent neural-network controller using a stabilization matrix and predictive inputs to solve a tracking problem under disturbances
2014cites this paper
Convergence of a Reinforcement Learning Algorithm in Continuous Domains
2014cites this paper
The Misbehavior of Reinforcement Learning In this paper, the authors compare two separate approaches to operant learning in terms of computational power and flexibility, putative neural correlates, and the ability to account for human behavior as observed in repeated-choice experiments.
2014cites this paper
Scalable Planning and Learning for Multiagent POMDPs
2014cites this paper
Value-gradient learning
2012influential citation