Continuous Deep Q-Learning with Model-based Acceleration

S. Gu,T. Lillicrap,I. Sutskever,S. Levine

Published 2016 in International Conference on Machine Learning

ABSTRACT

Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of modelfree algorithms, particularly when using high-dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm, which we call normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable.

PUBLICATION RECORD

Publication year
2016
Venue
International Conference on Machine Learning
Publication date
2016-03-02
Fields of study
Computer Science, Engineering
Identifiers
arXiv 1603.00748
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

International Conference on Learning Representations (ICLR)
2016influential reference
Learning Continuous Control Policies by Stochastic Value Gradients
2015cited by this paper
Deep Reinforcement Learning in Parameterized Action Space
2015cited by this paper
Human-level control through deep reinforcement learning
2015influential reference
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
2015cited by this paper
High-Dimensional Continuous Control Using Generalized Advantage Estimation
2015cited by this paper
Dueling Network Architectures for Deep Reinforcement Learning
2015cited by this paper
The importance of experience replay database composition in deep reinforcement learning
2015cited by this paper
One-shot learning of manipulation skills with online dynamics adaptation and neural network priors
2015cited by this paper
End-to-End Training of Deep Visuomotor Policies
2015cited by this paper
Prioritized Experience Replay
2015cited by this paper
From Pixels to Torques: Policy Learning with Deep Dynamical Models
2015cited by this paper
Trust Region Policy Optimization
2015influential reference
Approximate model-assisted Neural Fitted Q-Iteration
2014cited by this paper
Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014influential reference
Deterministic Policy Gradient Algorithms
2014cited by this paper
Evolving large-scale neural networks for vision-based reinforcement learning
2013cited by this paper
A Survey on Policy Search for Robotics
2013cited by this paper
Guided Policy Search
2013influential reference
Reinforcement learning in robotics: A survey
2013cited by this paper
On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference (Extended Abstract)
2012cited by this paper
Synthesis and stabilization of complex behaviors through online trajectory optimization
2012cited by this paper
MuJoCo: A physics engine for model-based control
2012cited by this paper
Reinforcement learning in feedback control
2011cited by this paper
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
2011cited by this paper
Relative Entropy Policy Search
2010cited by this paper
Approximate Inference and Stochastic Optimal Control
2010cited by this paper
Policy Gradient Methods for Robotics
2006cited by this paper
Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems
2004cited by this paper
Actor-citic agorithms
1999cited by this paper
Actor-Critic Algorithms
1999cited by this paper
Policy Gradient Methods for Reinforcement Learning with Function Approximation
1999cited by this paper
Locally Weighted Learning for Control
1997cited by this paper
Multi-Agent Residual Advantage Learning with General Function Approximation.
1996cited by this paper
Multi-player residual advantage learning with general function
1996cited by this paper
The Truck Backer-Upper: An Example of Self-Learning in Neural Networks
1995cited by this paper
Reinforcement Learning in Markovian and Non-markovian Environments
1991cited by this paper
The truck backer-upper: an example of self-learning in neural networks
1989cited by this paper

CITED BY

DRL-based Power Allocation in LiDAL-Assisted RLNC-NOMA OWC Systems
2026influential citation
End-to-End Autonomous Driving: From Classic Paradigm to Large Model Empowerment—A Comprehensive Survey
2026cites this paper
Joint Learning of Hierarchical Neural Options and Abstract World Model
2026cites this paper
DARWIN: Digital Twin Assisted Robot Navigation and WIreless Network Management
2026cites this paper
Balanced Exploration and Attention-Inspired Decision Making for Autonomous Driving
2026cites this paper
Laplacian Representations for Decision-Time Planning
2026cites this paper
SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer
2026cites this paper
TD-CD-MPPI: Temporal-Difference Constraint-Discounted Model Predictive Path Integral Control
2026cites this paper
Optimizing wastewater treatment through combined deep learning and deep reinforcement learning: Recent advances and future prospects.
2026cites this paper
Adaptive nonlinear recursive control based on normalized advantage neural network learning for large-scale cyber-physical power systems
2026cites this paper
The Surprising Difficulty of Search in Model-Based Reinforcement Learning
2026cites this paper
Adaptive optimization of BESS and grid set points: A model-free framework for energy management under dynamic tariff pricing
2026cites this paper
Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings
2026cites this paper
Learning to Share: Selective Memory for Efficient Parallel Agentic Systems
2026cites this paper
Mixed-Size Placement Prototyping Based on Reinforcement Learning with Semi-Concurrent Optimization
2025cites this paper
Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit Markets
2025cites this paper
A Novel Graph Neural Network Approach for Inverse Kinematics in Robotic Arms
2025cites this paper
AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction
2025cites this paper
First Order Model-Based RL through Decoupled Backpropagation
2025cites this paper
Invariant Control Strategies for Active Flow Control using Graph Neural Networks
2025cites this paper
Online Learning-Based Predictive Control for Nonlinear System
2025cites this paper
Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning
2025cites this paper
Robotic Arm Trajectory Planning in Dynamic Environments Based on Self-Optimizing Replay Mechanism
2025cites this paper
Synthesis of Interacting Model-Based and Model-Free Controllers for Optimal Control
2025cites this paper
Optimizing Decision Strategies through Advanced Learning Techniques
2025cites this paper
Improve global generalization for personalized federated learning within a Stackelberg game
2025cites this paper
Adaptive Optimal Admittance Control for Robotic Precision Grinding Based on Improved Normalized Advantage Function
2025cites this paper
Experimental data-efficient reinforcement learning with an ensemble of surrogate models
2025cites this paper
Is Bellman Equation Enough for Learning Control?
2025cites this paper
MLLMs for Versatile Scene Understanding: Towards Embodied Intelligent Surgical Robots
2025cites this paper
Reinforcement Learning for Real-Time Decision Making in Autonomous Systems
2025cites this paper
Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning
2025cites this paper
Motion planning for 7-degree-of-freedom bionic arm: Deep deterministic policy gradient algorithm based on imitation of human action
2025cites this paper
A digital twin of multiple energy hub systems with peer-to-peer energy sharing
2025cites this paper
Value-Based Reinforcement Learning for Mapless Navigation of a Mobile Robot
2025cites this paper
Factor Learning Portfolio Optimization Informed by Continuous-Time Finance Models
2025cites this paper
Soft Normalized Advantage Functions for Reinforcement Learning in Optimal Control Problems
2025cites this paper
Overhead line path planning based on deep reinforcement learning and geographical information system
2025cites this paper
Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning
2025cites this paper
Graph-Enhanced Policy Optimization in LLM Agent Training
2025cites this paper
Rapidly Adapting Policies to the Real World via Simulation-Guided Fine-Tuning
2025cites this paper
FraudGNN-RL: A Graph Neural Network With Reinforcement Learning for Adaptive Financial Fraud Detection
2025cites this paper
Continuous Q-Score Matching: Diffusion Guided Reinforcement Learning for Continuous-Time Control
2025cites this paper
Improved DQN-Based Intelligent Trajectory Control for Coal Gangue Sorting Robotic Manipulators
2025cites this paper
Inferring effort-safety trade off in perturbed squat-to-stand task by reward parameter estimation
2025cites this paper
Model-based reinforcement learning with adversarial discriminative domain adaptation
2025cites this paper
Actor-Free Continuous Control via Structurally Maximizable Q-Functions
2025cites this paper
Simulation‑based deep reinforcement learning for process time optimization in semiconductor cluster tool and an empirical study
2025cites this paper
Online Learning Agents for Group Cooperative Control Systems
2025cites this paper
Study on deep reinforcement learning for multi-task scheduling in cloud manufacturing
2025influential citation
Why does the two-timescale Q-learning converge to different mean field solutions? A unified convergence analysis
2024cites this paper
A Multi-Agent Deep Reinforcement Learning Framework for Personalized Cancer Treatment Decision Support in Dynamic Clinical Scenarios
2024cites this paper
Gray-Box Nonlinear Feedback Optimization
2024cites this paper
Valuation of Stocks by Integrating Discounted Cash Flow With Imitation Learning and Guided Policy
2024cites this paper
Continuous Control With Swarm Intelligence Based Value Function Approximation
2024cites this paper
Autonomous driving policy learning from demonstration using regression loss function
2024cites this paper
Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL
2024cites this paper
Conservative DDPG -- Pessimistic RL without Ensemble
2024cites this paper
Dynamic Explanation Emphasis in Human-XAI Interaction with Communication Robot
2024cites this paper
Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey
2024cites this paper
User Decision Guidance with Selective Explanation Presentation from Explainable-AI
2024cites this paper
Reinforcement Learning With Adaptive Policy Gradient Transfer Across Heterogeneous Problems
2024cites this paper
Noisy Spiking Actor Network for Exploration
2024cites this paper
Monotone, Bi-Lipschitz, and Polyak-Łojasiewicz Networks
2024cites this paper
High-efficiency reinforcement learning with hybrid architecture photonic integrated circuit
2024cites this paper
Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots
2024cites this paper
Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective
2024influential citation
Continual Semi-Supervised Malware Detection
2024cites this paper
G2P2C - A modular reinforcement learning algorithm for glucose control by glucose prediction and planning in Type 1 Diabetes
2024cites this paper
Innovative Integration of Q-learning in BSO Algorithm for Adaptive Community Detection in Social Networks
2024cites this paper
Advances in Deep Reinforcement Learning for Computer Vision Applications
2024cites this paper
Learning to Avoid Collisions in Mobile Manipulator Reaching Task
2024cites this paper
Research on trajectory planning based on deep reinforcement learning
2024cites this paper
Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning
2024cites this paper
Research on Agents Decision-making Ability based on Adaptive Deep Learning
2024cites this paper
Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
2024cites this paper
Prioritized Generative Replay
2024cites this paper
A Survey on Recent Advancements in Autonomous Driving Using Deep Reinforcement Learning: Applications, Challenges, and Solutions
2024cites this paper
Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making
2024cites this paper
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
2024cites this paper
Drone Landing and Reinforcement Learning: State-of-Art, Challenges and Opportunities
2024cites this paper
Deep reinforcement learning control of combined chemotherapy and anti-angiogenic drug delivery for cancerous tumor treatment
2024cites this paper
Distribution Guided Active Feature Acquisition
2024cites this paper
Learning in complex action spaces without policy gradients
2024cites this paper
Deep reinforcement learning based proactive dynamic obstacle avoidance for safe human-robot collaboration
2024cites this paper
Model-based Reinforcement Learning for Sim-to-Real Transfer in Robotics using HTM neural networks
2024cites this paper
Information-Directed Pessimism for Offline Reinforcement Learning
2024cites this paper
Online Control-Informed Learning
2024cites this paper
Influence of Visual Observations’ Dimensionality Reduction on a Deep Reinforcement Learning Controlled Terrestrial Robot
2024cites this paper
Reinforcement learning strategies using Monte-Carlo to solve the blackjack problem
2024cites this paper
Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting Deep Reinforcement Learning
2024cites this paper
Deep reinforcement learning challenges and opportunities for urban water systems.
2024cites this paper
Delta robot control by learning systems: Harnessing the power of deep reinforcement learning algorithms
2024cites this paper
Multi-agent dual actor-critic framework for reinforcement learning navigation
2024cites this paper
Discretizing Continuous Action Space With Unimodal Probability Distributions for On-Policy Reinforcement Learning
2024cites this paper
Blackbox Simulation Optimization
2024cites this paper
Deep Reinforcement Learning Robots for Algorithmic Trading: Considering Stock Market Conditions and U.S. Interest Rates
2024cites this paper
Reinforcement Learning-Based Approaches for Enhancing Security and Resilience in Smart Control: A Survey on Attack and Defense Methods
2024cites this paper
Application of Deep Reinforcement Learning to Defense and Intrusion Strategies Using Unmanned Aerial Vehicles in a Versus Game
2024cites this paper
Trajectory Design and Bandwidth Allocation Considering Power-Consumption Outage for UAV Communication: A Machine Learning Approach
2024cites this paper