Hybrid Reward Architecture for Reinforcement Learning

H. V. Seijen,Mehdi Fatemi,R. Laroche,Joshua Romoff,Tavian Barnes,Jeffrey Tsang

Published 2017 in Neural Information Processing Systems

ABSTRACT

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.

PUBLICATION RECORD

Publication year
2017
Venue
Neural Information Processing Systems
Publication date
2017-06-13
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1706.04208
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The Option-Critic Architecture
2016cited by this paper
Reinforcement Learning with Unsupervised Auxiliary Tasks
2016cited by this paper
Strategic Attentive Writer for Learning Macro-Actions
2016cited by this paper
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
2016cited by this paper
Learning values across many orders of magnitude
2016cited by this paper
Asynchronous Methods for Deep Reinforcement Learning
2016influential reference
Massively Parallel Methods for Deep Reinforcement Learning
2015cited by this paper
Prioritized Experience Replay
2015cited by this paper
Universal Value Function Approximators
2015cited by this paper
Human-level control through deep reinforcement learning
2015influential reference
Deep Reinforcement Learning with Double Q-Learning
2015cited by this paper
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015cited by this paper
Dueling Network Architectures for Deep Reinforcement Learning
2015influential reference
A Survey of Multi-Objective Sequential Decision-Making
2013cited by this paper
The Arcade Learning Environment: An Evaluation Platform for General Agents
2012cited by this paper
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
2011cited by this paper
Algorithms for Reinforcement Learning
2010cited by this paper
Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010)
2010cited by this paper
A theoretical and empirical analysis of Expected Sarsa
2009cited by this paper
An object-oriented representation for efficient reinforcement learning
2008cited by this paper
Learning and Memory: From Brain to Behavior
2007cited by this paper
Intrinsically Motivated Reinforcement Learning: A Promising Framework for Developmental Robot Learning
2005cited by this paper
Multiple-goal reinforcement learning with modular Sarsa(O)
2003cited by this paper
Recent Advances in Hierarchical Reinforcement Learning
2003cited by this paper
Multiple-Goal Reinforcement Learning with Modular Sarsa(0)
2003cited by this paper
Q-Decomposition for Reinforcement Learning Agents
2003cited by this paper
Cortex and Mind. Unifying Cognition
2003cited by this paper
Finite-time Analysis of the Multiarmed Bandit Problem
2002influential reference
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
1999cited by this paper
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
1999cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper

CITED BY

Optimization for 6G Wireless Communications With Heterogeneous VR and Non-VR 360° Videos: A Differentiated Reinforcement Learning Approach
2026cites this paper
Bellman Value Decomposition for Task Logic in Safe Optimal Control
2026cites this paper
Securing the Unseen: A Comprehensive Exploration Review of AI‐Powered Models for Zero‐Day Attack Detection
2026cites this paper
Can an AI agent lead human teams?
2026cites this paper
Intrinsic Reward Decomposition for Soft Robotic Manipulation Tasks
2025cites this paper
RL-EAR: reinforcement learning-based energy-aware routing for software-defined wireless sensor network
2025cites this paper
Explaining the performance impact of opportunity costs approximation in integrated demand management and vehicle routing
2025cites this paper
Gain Tuning Is Not What You Need: Reward Gain Adaptation for Constrained Locomotion Learning
2025cites this paper
When Words Fall Short: The Case for Conversational Interfaces that Don’t Listen
2025cites this paper
Reward shaping of deep reinforcement learning algorithm for autonomous navigation in a structured environment
2025influential citation
Explainable Reinforcement Learning for Formula One Race Strategy
2025cites this paper
Provably Efficient Reward Transfer in Reinforcement Learning with Discrete Markov Decision Processes
2025cites this paper
Automated Hybrid Reward Scheduling Via Large Language Models for Robotic Skill Learning
2025influential citation
Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization
2025influential citation
Multi-Reward Fusion: Learning from Other Policies through Distillation
2025cites this paper
Quadruped robot locomotion via soft actor-critic with muti-head critic and dynamic policy gradient
2025cites this paper
Detection of moving fish schools using reinforcement learning technique
2025cites this paper
Joint DNN Model Deployment, Selection, and Configuration for Heterogeneous Inference Services Toward Edge Intelligence
2025cites this paper
Explainable Decomposed Reward Q-Learning with Roulette Wheel Selection in Maze Navigation
2025cites this paper
Mobile Edge Adversarial Detection for Digital Twinning to the Metaverse: A Deep Reinforcement Learning Approach
2025cites this paper
HDCPO: A PPO-based path following and obstacle avoidance method for USV considering environmental disturbances
2025cites this paper
A reward shaping approach used for end-to-end mapless navigation using deep reinforcement learning algorithm
2025cites this paper
An Adaptive Computation Offloading Strategy for Wireless IoT: A Reinforcement Learning-Based Solution
2025cites this paper
Reinforcement Learning Autonomous Driving via Reward Function Design in Simulation
2025cites this paper
Attentional value-factorization-based resource allocation and performance evaluation for intelligent connected vehicles
2025cites this paper
The interoceptive origin of reinforcement learning.
2025cites this paper
A spiking network model of the cerebellum for predicting movements with diverse complex spikes
2025cites this paper
Data-Efficient Learning-Based Iterative Optimization Method With Time-Varying Prediction Horizon for Multiagent Collaboration
2025cites this paper
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
2025cites this paper
Play to Earn in Augmented Reality With Mobile Edge Computing Over Wireless Networks: A Deep Reinforcement Learning Approach
2025cites this paper
Deep Augmentation In 5G Scheduling Design Acquiring Knowledge from Concept to Application
2024cites this paper
Quasimetric Value Functions with Dense Rewards
2024cites this paper
Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution
2024influential citation
Continuous Decision-Making in Lane Changing and Overtaking Maneuvers for Unmanned Vehicles: A Risk-Aware Reinforcement Learning Approach With Task Decomposition
2024cites this paper
Segmenting Action-Value Functions over Time Scales in SARSA via TD(Δ)
2024cites this paper
From approximation error to optimality gap - Explaining the performance impact of opportunity cost approximation in integrated demand management and vehicle routing
2024cites this paper
Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning
2024cites this paper
Time-Varying Weights in Multi-Reward Architecture for Deep Reinforcement Learning
2024influential citation
Beamforming prediction based on the multireward DQN framework for UAV-RIS-assisted THz communication systems
2024cites this paper
Age of correlated information-optimal dynamic policy scheduling for sustainable Green IoT devices: A multi-agent deep reinforcement learning approach
2024cites this paper
Adaptive Curriculum Learning With Successor Features for Imbalanced Compositional Reward Functions
2024cites this paper
BaziGooshi: A Hybrid Model of Reinforcement Learning for Generalization in Gameplay
2024influential citation
Aquilas: Adaptive QoS-Oriented Multipath Packet Scheduler with Hierarchical Intelligence for QUIC
2024cites this paper
Continuous Control with Coarse-to-fine Reinforcement Learning
2024cites this paper
Integrating an Ensemble Reward System into an Off-Policy Reinforcement Learning Algorithm for the Economic Dispatch of Small Modular Reactor-Based Energy Systems
2024cites this paper
Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning
2024cites this paper
Trajectory Planning and Tracking Control of Autonomous Vehicles Based on Improved Artificial Potential Field
2024cites this paper
Radio Resource Management and Path Planning in Intelligent Transportation Systems via Reinforcement Learning for Environmental Sustainability
2024cites this paper
Off-Policy Reinforcement Learning with High Dimensional Reward
2024cites this paper
Counterfactual Reward Estimation for Credit Assignment in Multi-agent Deep Reinforcement Learning over Wireless Video Transmission
2024cites this paper
Segmenting Action-Value Functions Over Time-Scales in SARSA using TD(Δ)
2024cites this paper
Survival games for humans and machines
2024cites this paper
Reward-Punishment Reinforcement Learning with Maximum Entropy
2024cites this paper
Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare
2023cites this paper
Residual Q-Learning: Offline and Online Policy Customization without Value
2023cites this paper
Emergent Incident Response for Unmanned Warehouses with Multi-agent Systems
2023cites this paper
Time and temporal abstraction in continual learning: tradeoffs, analogies and regret in an active measuring setting
2023cites this paper
Dynamic Multi-Team Racing: Competitive Driving on 1/10-th Scale Vehicles via Learning in Simulation
2023cites this paper
HCTA:Hierarchical Cooperative Task Allocation in Multi-Agent Reinforcement Learning
2023cites this paper
Having multiple selves helps learning agents explore and adapt in complex changing worlds
2023cites this paper
Towards model-free RL algorithms that scale well with unstructured data
2023cites this paper
Explaining Reinforcement Learning Agents Through Counterfactual Action Outcomes
2023cites this paper
Heterogeneous 360 Degree Videos in Metaverse: Differentiated Reinforcement Learning Approaches
2023cites this paper
Homeostatic Reinforcement Learning through Soft Behavior Switching with Internal Body State
2023cites this paper
Explaining Agent Preferences & Behavior: Integrating Reward-Decomposition & Contrastive-Highlights
2023cites this paper
Comparing explanations in RL
2023cites this paper
Explainable autonomous robots in continuous state space based on graph-structured world model
2023cites this paper
The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications
2023cites this paper
Explainable Reinforcement Learning: A Survey and Comparative Review
2023cites this paper
Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation
2023cites this paper
On the Value of Myopic Behavior in Policy Reuse
2023cites this paper
Autonomous vehicular overtaking maneuver: A survey and taxonomy
2023cites this paper
Dynamic Reward in DQN for Autonomous Navigation of UAVs Using Object Detection
2023cites this paper
Play to Earn in the Metaverse with Mobile Edge Computing over Wireless Networks: A Deep Reinforcement Learning Approach
2023cites this paper
A Survey on Reinforcement Learning Methods in Bionic Underwater Robots
2023cites this paper
Virtual Reality in Metaverse Over Wireless Networks with User-Centered Deep Reinforcement Learning
2023influential citation
Using biologically hierarchical modular architecture for explainable, tunable, generalizable, spatial AI
2023cites this paper
Learning robotic manipulation skills with multiple semantic goals by conservative curiosity-motivated exploration
2023cites this paper
User-centric Heterogeneous-action Deep Reinforcement Learning for Virtual Reality in the Metaverse over Wireless Networks
2023influential citation
A Bayesian Network Approach to Explainable Reinforcement Learning with Distal Information
2023cites this paper
Deep Reinforcement Learning for Asset Allocation: Reward Clipping
2023cites this paper
Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning
2023cites this paper
AcTrak: Controlling a Steerable Surveillance Camera using Reinforcement Learning
2023cites this paper
Learning Failure Prevention Skills for Safe Robot Manipulation
2023cites this paper
Solving Continuous Control via Q-learning
2022cites this paper
A federated multi-agent deep reinforcement learning for vehicular fog computing
2022cites this paper
Training Agent to Play Pac-Man under Authentic Environment Based on Image Recognition
2022cites this paper
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity
2022cites this paper
Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents
2022influential citation
Late Breaking Results: Flexible Chip Placement via Reinforcement Learning
2022cites this paper
Symbolic Explanation of Affinity-Based Reinforcement Learning Agents with Markov Models
2022cites this paper
Integrating Policy Summaries with Reward Decomposition Explanations
2022cites this paper
Time-aware Deep Reinforcement Learning with Multi-Temporal Abstraction by
2022cites this paper
Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
2022cites this paper
Distributional Reinforcement Learning with Regularized Wasserstein Loss
2022cites this paper
Dynamic Weight-based Multi-Objective Reward Architecture for Adaptive Traffic Signal Control System
2022influential citation
Modularity benefits reinforcement learning agents with competing homeostatic drives
2022cites this paper
Towards designing a generic and comprehensive deep reinforcement learning framework
2022cites this paper
How to Reuse and Compose Knowledge for a Lifetime of Tasks: A Survey on Continual Learning and Functional Composition
2022cites this paper
Reinforcement Learning on Graphs: A Survey
2022cites this paper