Taming the Noise in Reinforcement Learning via Soft Updates

Published 2015 in Conference on Uncertainty in Artificial Intelligence

ABSTRACT

Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias results from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process. We show that this method reduces the bias of the value-function estimation, leading to faster convergence to the optimal value and the optimal policy. Moreover, G-learning enables the natural incorporation of prior domain knowledge, when available. The stochastic nature of G-learning also makes it avoid some exploration costs, a property usually attributed only to on-policy algorithms. We illustrate these ideas in several examples, where G-learning results in significant improvements of the convergence rate and the cost of the learning process.

PUBLICATION RECORD

Publication year
2015
Venue
Conference on Uncertainty in Artificial Intelligence
Publication date
2015-12-28
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1512.08562
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Human-level control through deep reinforcement learning
2015cited by this paper
Deep Reinforcement Learning with Double Q-Learning
2015cited by this paper
Increasing the Action Gap: New Operators for Reinforcement Learning
2015influential reference
Stochastic approximation
2013cited by this paper
Trading Value and Information in MDPs
2012influential reference
An information-theoretic approach to curiosity-driven reinforcement learning
2012cited by this paper
An Intelligent Battery Controller Using Bias-Corrected Q-learning
2012cited by this paper
Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax
2011cited by this paper
Speedy Q-Learning
2011cited by this paper
Reinforcement Learning and Dynamic Programming Using Function Approximators
2010cited by this paper
Double Q-learning
2010cited by this paper
Approximate Inference and Stochastic Optimal Control
2010cited by this paper
Algorithms for Reinforcement Learning
2010influential reference
Relative Entropy Policy Search
2010cited by this paper
Dynamic policy programming
2010cited by this paper
Approximate dynamic programming: solving the curses of dimensionality
2009cited by this paper
Efficient computation of optimal actions
2009cited by this paper
A theoretical and empirical analysis of Expected Sarsa
2009influential reference
Optimal control as a graphical model inference problem
2009cited by this paper
Dynamic Programming and Optimal Control Problem Set: Deterministic Systems and the Shortest Path Problem
2008cited by this paper
Approximate Dynamic Programming
2007cited by this paper
Noisy reinforcements in reinforcement learning: some case studies based on gridworlds
2006cited by this paper
Linearly-solvable Markov decision problems
2006cited by this paper
The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis
2006cited by this paper
Probability theory: the logic of science
2005cited by this paper
Learning Rates for Q-learning
2004cited by this paper
Rational Overoptimism (and Other Biases)
2004cited by this paper
R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
2001cited by this paper
Issues in Using Function Approximation for Reinforcement Learning
1999cited by this paper
Introduction to Reinforcement Learning
1998cited by this paper
Bayesian Q-Learning
1998cited by this paper
Reinforcement Learning: An Introduction
1998influential reference
Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions
1997cited by this paper
Dynamic Programming and Optimal Control, Two Volume Set
1995cited by this paper
When the Best Move Isn't Optimal: Q-learning with Exploration
1994influential reference
On Reinforcement Learning of Control Actions in Noisy and Non-Markovian Domains
1994cited by this paper
Reinforcement learning in continuous time: advantage updating
1994cited by this paper
Advantage Updating Applied to a Differrential Game
1994cited by this paper
Q-learning
1992influential reference
Anomalies: The Winner's Curse
1988cited by this paper
Competitive Bidding in High-Risk Situations
1971cited by this paper
Stochastic Approximation
1969cited by this paper

CITED BY

MePoly: Max Entropy Polynomial Policy Optimization
2026cites this paper
Multi-Step Alignment as Markov Games: An Optimistic Online Mirror Descent Approach with Convergence Guarantees
2026cites this paper
Model-free policy gradient for discrete-time mean-field control
2026cites this paper
Conformal Policy Control
2026cites this paper
A novel DOE surface design method based on reinforcement learning architecture
2025cites this paper
Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation
2025cites this paper
Residual Policy Gradient: A Reward View of KL-regularized Objective
2025cites this paper
QeRL: Beyond Efficiency - Quantization-enhanced Reinforcement Learning for LLMs
2025cites this paper
Soft Update for Enhanced DQN Based on Mountain Car Problem
2025cites this paper
When Maximum Entropy Misleads Policy Optimization
2025cites this paper
Uncertainty and Noise Aware Decision Making for Autonomous Vehicles: A Bayesian Approach
2025cites this paper
Divergence-Augmented Policy Optimization
2025cites this paper
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
2025cites this paper
ABS-TD3: Efficient IoT data submission in DAG-based DLTs for digital circular economy
2025cites this paper
Multi-Objective Autonomous Eco-Driving Strategy: A Pathway to Future Green Mobility
2025cites this paper
Reinforcement Learning for Process Control: Review and Benchmark Problems
2025cites this paper
Maximum Next-State Entropy for Efficient Reinforcement Learning
2025cites this paper
On the Effectiveness of Regularization Methods for Soft Actor-Critic in Discrete-Action Domains
2025cites this paper
Intelligent Decision Making in Dynamic Environments Based on Evolutionary Game Theory and Multi-Agent Reinforcement Learning
2025cites this paper
Failure Modes of Maximum Entropy RLHF
2025cites this paper
Multi-Armed Sampling Problem and the End of Exploration
2025cites this paper
A Weighted Smooth Q-Learning Algorithm
2025cites this paper
Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning
2025cites this paper
Beyond Prompt Chaining: The TB-CSPN Architecture for Agentic AI
2025cites this paper
Pseudo-distribution elite critics: Enhancing accuracy in reinforcement learning value estimation
2025cites this paper
Efficient Energy Management of Plug-In Hybrid Electric Vehicles Through Ensemble With In-Target Minimization Q-Learning
2025cites this paper
Exploring reinforcement learning in process control: a comprehensive survey
2025cites this paper
Q♯: Provably Optimal Distributional RL for LLM Post-Training
2025cites this paper
Vairiational Stochastic Games
2025cites this paper
Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo
2025cites this paper
Generative Actor Critic
2025cites this paper
Multi-Agent Q-Learning via Best Choice Dynamics
2025cites this paper
Uncertainty-Aware Reinforcement Learning Agents for Noisy Environments
2025cites this paper
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
2025cites this paper
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
2025cites this paper
Noise Resilience of Successor and Predecessor Feature Algorithms in One- and Two-Dimensional Environments
2025cites this paper
Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization
2025cites this paper
Towards Sustainable High-Speed Cruising: Optimizing Energy Efficiency of Plug-in Hybrid Electric Vehicle via Intelligent Pulse-and-Glide Strategy
2024cites this paper
Opponent’s Dynamic Prediction Model-Based Power Control Scheme in Secure Transmission and Smart Jamming Game
2024cites this paper
Deep reinforcement learning for time-critical wilderness search and rescue using drones
2024cites this paper
Improving GFlowNets for Text-to-Image Diffusion Alignment
2024cites this paper
Value Improved Actor Critic Algorithms
2024cites this paper
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
2024cites this paper
Robust off-policy Reinforcement Learning via Soft Constrained Adversary
2024cites this paper
The Evolution of Reinforcement Learning in Quantitative Finance: A Survey
2024cites this paper
Efficient Reinforcement Learning With the Novel N-Step Method and V-Network
2024cites this paper
Parameter Efficient Reinforcement Learning from Human Feedback
2024cites this paper
Optimality Theory of Stigmergic Collective Information Processing by Chemotactic Cells
2024influential citation
Reinforcement learning for Multiple Goals in Goals-Based Wealth Management
2024cites this paper
Ancestral Reinforcement Learning: Unifying Zeroth-Order Optimization and Genetic Algorithms for Reinforcement Learning
2024cites this paper
A unified framework to control estimation error in reinforcement learning
2024cites this paper
Decoupling regularization from the action space
2024cites this paper
Robust iterative value conversion: Deep reinforcement learning for neurochip-driven edge robots
2024cites this paper
Addressing maximization bias in reinforcement learning with two-sample testing
2024cites this paper
Averaging log-likelihoods in direct alignment
2024cites this paper
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
2024cites this paper
The Max-Min Formulation of Multi-Objective Reinforcement Learning: From Theory to a Model-Free Algorithm
2024cites this paper
Random Policy Evaluation Uncovers Policies of Generative Flow Networks
2024cites this paper
Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence
2024cites this paper
Deep Reinforcement Learning in Non-Markov Market-Making
2024cites this paper
Reinforcement learning for multi-agent with asynchronous missing information fusion method
2024cites this paper
Investigating Transfer Learning in Noisy Environments: A Study of Predecessor and Successor Features in Spatial Learning Using a T-Maze
2024cites this paper
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
2024cites this paper
Highway Reinforcement Learning
2024cites this paper
Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning
2024cites this paper
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
2024cites this paper
Reward-Punishment Reinforcement Learning with Maximum Entropy
2024influential citation
Satisficing Exploration for Deep Reinforcement Learning
2024cites this paper
Comparing Deterministic and Soft Policy Gradients for Optimizing Gaussian Mixture Actors
2024cites this paper
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
2024cites this paper
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
2024cites this paper
Imitation-Regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning
2024cites this paper
Diminishing Return of Value Expansion Methods
2024cites this paper
ERL-TD: Evolutionary Reinforcement Learning Enhanced with Truncated Variance and Distillation Mutation
2024cites this paper
Dissecting Deep RL with High Update Ratios: Combatting Value Divergence
2024cites this paper
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review
2024cites this paper
Entropy-Regularized Token-Level Policy Optimization for Large Language Models
2024cites this paper
Variational Stochastic Games
2024cites this paper
Discrete Probabilistic Inference as Control in Multi-path Environments
2024cites this paper
AUV Surfacing Control With Adversarial Attack Against DLaaS Framework
2024cites this paper
Skill enhancement learning with knowledge distillation
2024cites this paper
Relaxed Equilibria for Time-Inconsistent Markov Decision Processes
2023cites this paper
Swap Softmax Twin Delayed Deep Deterministic Policy Gradient
2023cites this paper
Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
2023cites this paper
Coherent Soft Imitation Learning
2023cites this paper
A Novel Model-Assisted Decentralized Multi-Agent Reinforcement Learning for Joint Optimization of Hybrid Beamforming in Massive MIMO mmWave Systems
2023cites this paper
A shared novelty-seeking basis for creativity and curiosity
2023cites this paper
Learning non-Markovian Decision-Making from State-only Sequences
2023cites this paper
Bayesian Reinforcement Learning With Limited Cognitive Load
2023cites this paper
Maximum Entropy Heterogeneous-Agent Mirror Learning
2023cites this paper
Exploring the Noise Resilience of Successor Features and Predecessor Features Algorithms in One and Two-Dimensional Environments
2023cites this paper
Fast Rates for Maximum Entropy Exploration
2023influential citation
Statistical Inference with Stochastic Gradient Methods under $\phi$-mixing Data
2023cites this paper
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
2023influential citation
Maximum causal entropy inverse constrained reinforcement learning
2023cites this paper
Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition
2023cites this paper
Careful at Estimation and Bold at Exploration
2023cites this paper
Maximum Entropy Optimal Control of Continuous-Time Dynamical Systems
2023cites this paper
Time-series Generation by Contrastive Imitation
2023cites this paper
Comparative Study for Deep Deterministic Policy Gradient and Soft Actor Critic Using an Inverted Pendulum System
2023cites this paper