Hindsight policy gradients

Paulo E. Rauber,Filipe Wall Mutz,J. Schmidhuber

Published 2017 in International Conference on Learning Representations

ABSTRACT

Goal-conditional policies allow reinforcement learning agents to pursue specific goals during different episodes. In addition to their potential to generalize desired behavior to unseen goals, such policies may also help in defining options for arbitrary subgoals, enabling higher-level planning. While trying to achieve a specific goal, an agent may also be able to exploit information about the degree to which it has achieved alternative goals. Reinforcement learning agents have only recently been endowed with such capacity for hindsight, which is highly valuable in environments with sparse rewards. In this paper, we show how hindsight can be introduced to likelihood-ratio policy gradient methods, generalizing this capacity to an entire class of highly successful algorithms. Our preliminary experiments suggest that hindsight may increase the sample efficiency of policy gradient methods.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Learning Representations
Publication date
2017-11-01
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1711.06006
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Unicorn: Continual Learning with a Universal, Off-policy Agent
2018cited by this paper
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
2018cited by this paper
Zero-Shot Visual Imitation
2018cited by this paper
Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play
2017cited by this paper
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
2017cited by this paper
Divide-and-Conquer Reinforcement Learning
2017cited by this paper
Hindsight Experience Replay
2017influential reference
Automatic Goal Generation for Reinforcement Learning Agents
2017cited by this paper
Reverse Curriculum Generation for Reinforcement Learning
2017cited by this paper
FeUdal Networks for Hierarchical Reinforcement Learning
2017cited by this paper
Hierarchical Actor-Critic
2017cited by this paper
Safe and Efficient Off-Policy Reinforcement Learning
2016cited by this paper
Overcoming catastrophic forgetting in neural networks
2016cited by this paper
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
2016cited by this paper
Benchmarking Deep Reinforcement Learning for Continuous Control
2016cited by this paper
Target-driven visual navigation in indoor scenes using deep reinforcement learning
2016cited by this paper
Factored Contextual Policy Search with Bayesian optimization
2016cited by this paper
Pattern Recognition And Machine Learning
2016cited by this paper
High Confidence Policy Improvement
2015cited by this paper
Universal Value Function Approximators
2015cited by this paper
Continuous control with deep reinforcement learning
2015cited by this paper
Bayesian Optimization for Contextual Policy Search *
2015cited by this paper
Safe Reinforcement Learning
2015cited by this paper
Human-level control through deep reinforcement learning
2015cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Active contextual policy search
2014cited by this paper
Multi-task policy search for robotics
2014cited by this paper
Data-Efficient Generalization of Robot Skills with Contextual Policy Search
2013cited by this paper
First Experiments with PowerPlay
2012cited by this paper
Learning Parameterized Skills
2012cited by this paper
Reinforcement learning to adjust parametrized motor primitives to new situations
2012cited by this paper
The Arcade Learning Environment: An Evaluation Platform for General Agents
2012cited by this paper
PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem
2011cited by this paper
The two-dimensional organization of behavior
2011cited by this paper
On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
2010cited by this paper
Understanding the difficulty of training deep feedforward neural networks
2010cited by this paper
Rectified Linear Units Improve Restricted Boltzmann Machines
2010cited by this paper
Reinforcement learning of motor skills with policy gradients
2008cited by this paper
Pattern Recognition and Machine Learning
2006cited by this paper
Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization
2005cited by this paper
Intrinsically Motivated Reinforcement Learning
2004cited by this paper
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
2004cited by this paper
Recent Advances in Hierarchical Reinforcement Learning
2003cited by this paper
Evolving Keepaway Soccer Players through Task Decomposition
2003cited by this paper
Hierarchical Policy Gradient Algorithms
2003cited by this paper
Learning from Scarce Experience
2002cited by this paper
Optimal Ordered Problem Solver
2002cited by this paper
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning
2002cited by this paper
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
2001cited by this paper
Eligibility Traces for Off-Policy Policy Evaluation
2000cited by this paper
Policy Gradient Methods for Reinforcement Learning with Function Approximation
1999cited by this paper
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
1999cited by this paper
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
1999cited by this paper
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
1999cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Multi-time Models for Temporally Abstract Planning
1997cited by this paper
Incremental Evolution of Complex General Behavior
1997cited by this paper
On Learning How to Learn Learning Strategies Technical Report Fki-198-94 (revised)
1995cited by this paper
Inductive Functional Programming Using Incremental Program Transformation
1995cited by this paper
On learning how to learn learning strategies
1994cited by this paper
Hierarchical Chunking in Classifier Systems
1994cited by this paper
Large Sample Methods in Statistics: An Introduction with Applications
1993cited by this paper
Learning via task decomposition
1993cited by this paper
Self-improving reactive agents based on reinforcement learning, planning and teaching
1992cited by this paper
Feudal Reinforcement Learning
1992cited by this paper
Learning to generate sub-goals for action sequences
1991cited by this paper
Learning to Generate Artificial Fovea Trajectories for Target Detection
1991cited by this paper
Learning to generate focus trajectories for attentive vision
1990cited by this paper
Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem
1989cited by this paper

CITED BY

Prioritization Hindsight Experience Based on Spatial Position Attention for Robots
2025cites this paper
HRM-Agent: Training a recurrent reasoning model in dynamic environments using reinforcement learning
2025cites this paper
BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards
2025cites this paper
DAgger-Based Hindsight Policy Optimization for Reach-Avoid Games
2025cites this paper
Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback
2025cites this paper
Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots
2025cites this paper
Optimal bipartite graph matching-based goal selection for policy-based hindsight learning
2024cites this paper
Hindsight Experience Replay Accelerates Proximal Policy Optimization
2024cites this paper
Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals
2024cites this paper
Directed Exploration in Reinforcement Learning from Linear Temporal Logic
2024cites this paper
Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy
2024cites this paper
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
2023cites this paper
Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates
2023cites this paper
Learning Interactive Real-World Simulators
2023cites this paper
Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
2023cites this paper
Deep Generative Models for Decision-Making and Control
2023cites this paper
Goal-Conditioned Supervised Learning with Sub-Goal Prediction
2023cites this paper
Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning
2023cites this paper
Skill Decision Transformer
2023cites this paper
Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria
2022cites this paper
Online Decision Transformer
2022cites this paper
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
2022cites this paper
Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks
2022cites this paper
Consistent Experience Replay in High-Dimensional Continuous Control with Decayed Hindsights
2022cites this paper
Goal-Conditioned Generators of Deep Policies
2022influential citation
Open-Ended Reinforcement Learning with Neural Reward Functions
2022cites this paper
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning
2021cites this paper
DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies
2021cites this paper
FirePlace: Placing Firecraker Virtual Machines with Hindsight Imitation
2021cites this paper
Learning One Representation to Optimize All Rewards
2021cites this paper
Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment
2021cites this paper
Offline Reinforcement Learning as One Big Sequence Modeling Problem
2021cites this paper
Unbiased Methods for Multi-Goal Reinforcement Learning
2021cites this paper
Hindsight Curriculum Generation Based Multi-Goal Experience Replay
2021cites this paper
Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay
2021cites this paper
Addressing Hindsight Bias in Multigoal Reinforcement Learning
2021cites this paper
Exploration in Deep Reinforcement Learning: A Comprehensive Survey
2021cites this paper
Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
2021cites this paper
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain
2021cites this paper
Reinforcement Learning as One Big Sequence Modeling Problem
2021cites this paper
Towards Continual Reinforcement Learning: A Review and Perspectives
2020cites this paper
Episodic Self-Imitation Learning with Hindsight
2020cites this paper
Scalable Multi-Task Imitation Learning with Autonomous Improvement
2020cites this paper
Hindsight Experience Replay with Kronecker Product Approximate Curvature
2020cites this paper
Research on Complex Robot Manipulation Tasks Based on Hindsight Trust Region Policy Optimization
2020cites this paper
Counterfactual Credit Assignment in Model-Free Reinforcement Learning
2020cites this paper
HIGhER: Improving instruction following with Hindsight Generation for Experience Replay
2020cites this paper
Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning
2020influential citation
Policy Continuation with Hindsight Inverse Dynamics
2019cites this paper
HINDSIGHT TRUST REGION POLICY OPTIMIZATION
2019influential citation
HINDSIGHT TRUST REGION POLICY OPTIMIZATION
2019influential citation
Curriculum-guided Hindsight Experience Replay
2019cites this paper
Hindsight Credit Assignment
2019cites this paper
Invariant Transform Experience Replay
2019cites this paper
Options as responses: Grounding behavioural hierarchies in multi-agent RL
2019cites this paper
Learning Latent Plans from Play
2019cites this paper
Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning
2019cites this paper
Learning to Reach Goals via Iterated Supervised Learning
2019cites this paper
Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions
2019influential citation
Training Agents using Upside-Down Reinforcement Learning
2019cites this paper
Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation
2019cites this paper
Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following
2019cites this paper
Hindsight Trust Region Policy Optimization
2019influential citation
Guided goal generation for hindsight multi-goal reinforcement learning
2019cites this paper
Self-supervised Learning of Distance Functions for Goal-Conditioned Reinforcement Learning
2019cites this paper
Exploration via Hindsight Goal Generation
2019cites this paper
Curiosity-Driven Multi-Criteria Hindsight Experience Replay
2019cites this paper
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
2019cites this paper
A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials
2018cites this paper
Visual Reinforcement Learning with Imagined Goals
2018cites this paper
Improvements on Hindsight Learning
2018influential citation
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
2018cites this paper
Cooperative Reinforcement Learning
2017cites this paper
Unbiased Methods for Multi-Goal RL
year unknowncites this paper