Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policy Optimization

Souradip Chakraborty,A. S. Bedi,Kasun Weerakoon,Prithvi Poddar,Alec Koppel,Pratap Tokekar,Dinesh Manocha

Published 2023 in IEEE International Conference on Robotics and Automation

ABSTRACT

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse rewards are common in continuous control robotics tasks such as manipulation and navigation and make the learning problem hard due to the non-trivial estimation of value functions over the state space. This demands either reward shaping or expert demonstrations for the sparse reward environment. However, obtaining high-quality demonstrations is quite expensive and sometimes even impossible. We propose a heavy-tailed policy parametrization along with a modified momentum-based policy gradient tracking scheme (HT-SPG) to induce a stable exploratory behavior in the algorithm. The proposed algorithm does not require access to expert demonstrations. We test the performance of HT-SPG on various benchmark tasks of continuous control with sparse rewards such as 1D Mario, Pathological Mountain Car, Sparse Pendulum in OpenAI Gym, and Sparse MuJoCo environments (Hopper-v2, Half-Cheetah, Walker-2D). We show consistent performance improvement across all tasks in terms of high average cumulative reward without requiring access to expert demonstrations. We further demonstrate that a navigation policy trained using HT-SPG can be easily transferred into a Clearpath Husky robot to perform real-world navigation tasks.

PUBLICATION RECORD

Publication year
2023
Venue
IEEE International Conference on Robotics and Automation
Publication date
2023-05-29
Fields of study
Computer Science, Engineering
Identifiers
DOI 10.1109/ICRA48891.2023.10161186
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Survey of Imitation Learning Methods, Environments and Metrics
2024cited by this paper
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning
2023cited by this paper
HTRON: Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm
2022cited by this paper
Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration
2022influential reference
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
2022cited by this paper
On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces
2022cited by this paper
TERP: Reliable Planning in Uneven Outdoor Environments using Deep Reinforcement Learning
2021cited by this paper
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
2021influential reference
Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
2021cited by this paper
Stochastic Recursive Momentum for Policy Gradient Methods
2020influential reference
On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning and SLAM Based Approach
2020cited by this paper
Learning Robust Control Policies for End-to-End Autonomous Driving From Data-Driven Simulation
2020cited by this paper
Robot Navigation in Crowded Environments Using Deep Reinforcement Learning
2020cited by this paper
Guided Exploration with Proximal Policy Optimization using a Single Demonstration
2020cited by this paper
On the Global Convergence Rates of Softmax Policy Gradient Methods
2020influential reference
Policy Gradient From Demonstration and Curiosity
2020cited by this paper
End-to-End Robotic Reinforcement Learning without Reward Engineering
2019cited by this paper
The problem with DDPG: understanding failures in deterministic environments with sparse rewards
2019influential reference
Reinforcement learning for robotic manipulation using simulated locomotion demonstrations
2019cited by this paper
Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
2019influential reference
Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards
2019cited by this paper
Momentum-Based Variance Reduction in Non-Convex SGD
2019influential reference
Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning
2018cited by this paper
Policy Optimization with Demonstrations
2018influential reference
Stochastic Variance-Reduced Policy Gradient
2018cited by this paper
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
2017cited by this paper
Parameter Space Noise for Exploration
2017cited by this paper
Overcoming Exploration in Reinforcement Learning with Demonstrations
2017influential reference
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations
2017cited by this paper
Deep Q-learning From Demonstrations
2017cited by this paper
Imitation Learning
2017cited by this paper
Curiosity-Driven Exploration by Self-Supervised Prediction
2017cited by this paper
Proximal Policy Optimization Algorithms
2017influential reference
The Beta Policy for Continuous Control Reinforcement Learning
2017cited by this paper
Generative Adversarial Imitation Learning
2016cited by this paper
VIME: Variational Information Maximizing Exploration
2016influential reference
OpenAI Gym
2016cited by this paper
Trust Region Policy Optimization
2015cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Maximum Entropy Inverse Reinforcement Learning
2008cited by this paper
Power-Law Distributions in Empirical Data
2007cited by this paper
The Black Swan: The Impact of the Highly Improbable
2007cited by this paper
Infinite-Horizon Policy-Gradient Estimation
2001cited by this paper
Algorithms for Inverse Reinforcement Learning
2000cited by this paper
Reinforcement Learning: An Introduction
1998influential reference
Is the Geometry of Nature Fractal?
1998cited by this paper
In Advances in Neural Information Processing Systems
1996cited by this paper
Learning from Demonstration
1996cited by this paper
Reward Functions for Accelerated Learning
1994cited by this paper
Fractal Geometry of Nature
1984cited by this paper

CITED BY

Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning
2025cites this paper
VAPOR: Legged Robot Navigation in Unstructured Outdoor Environments using Offline Reinforcement Learning
2024cites this paper
Q-exponential Family for Policy Optimization
2024cites this paper
Progressive Prioritized Experience Replay for Multi-Agent Reinforcement Learning
2024cites this paper
ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control
2024cites this paper
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation
2023cites this paper
REBEL: A Regularization-Based Solution for Reward Overoptimization in Reinforcement Learning from Human Feedback
2023cites this paper