Reinforcement learning from simultaneous human and MDP reward

Published 2012 in Adaptive Agents and Multi-Agent Systems

ABSTRACT

As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users---without programming skills---can transfer their task knowledge to agents, learning can accelerate dramatically, reducing costly trials. The tamer framework guides the design of agents whose behavior can be shaped through signals of approval and disapproval, a natural form of human feedback. More recently, tamer+rl was introduced to enable human feedback to augment a traditional reinforcement learning (RL) agent that learns from a Markov decision process's (MDP) reward signal. We address limitations of prior work on tamer and tamer+rl, contributing in two critical directions. First, the four successful techniques for combining human reward with RL from prior tamer+rl work are tested on a second task, and these techniques' sensitivities to parameter changes are analyzed. Together, these examinations yield more general and prescriptive conclusions to guide others who wish to incorporate human knowledge into an RL algorithm. Second, tamer+rl has thus far been limited to a sequential setting, in which training occurs before learning from MDP reward. In this paper, we introduce a novel algorithm that shares the same spirit as tamer+rl but learns simultaneously from both reward sources, enabling the human feedback to come at any time during the reinforcement learning process. We call this algorithm simultaneous tamer+rl. To enable simultaneous learning, we introduce a new technique that appropriately determines the magnitude of the human model's influence on the RL algorithm throughout time and state-action space.

PUBLICATION RECORD

Publication year
2012
Venue
Adaptive Agents and Multi-Agent Systems
Publication date
2012-06-04
Fields of study
Computer Science
Identifiers
DOI 10.65109/pajo3896
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

A new technique is introduced to determine the magnitude of the human model's influence across time and state-action space.
Confidence 0.93

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
simultaneous tamer+rl learns from human feedback and MDP reward at the same time, allowing feedback to arrive at any point during reinforcement learning.
Confidence 0.98

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
The resulting comparisons support more general and prescriptive guidance for incorporating human knowledge into reinforcement-learning algorithms.
Confidence 0.90

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
Four prior techniques for combining human reward with reinforcement learning were evaluated on a second task, and their sensitivity to parameter changes was analyzed.
Confidence 0.95

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review

CONCEPTS

human feedback
signal, reward source

Human-provided approval or disapproval signals used as a learning signal for the agent.

Aliases: human reward

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
human model influence
quantity, model component

The estimated strength of the human feedback model's contribution to the RL update process.

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
mdp reward
reward source

The reward signal supplied by the Markov decision process that the RL agent normally learns from.

Aliases: environment reward

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
parameter sensitivity analysis
analysis

An analysis of how the performance of a method changes when its parameter settings are varied.

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
reinforcement learning
method, learning setting

A learning setting in which an agent improves its policy from reward signals.

Aliases: RL

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
simultaneous tamer+rl
algorithm

A variant of tamer+rl designed to learn from human feedback and MDP reward simultaneously during training.

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
state-action space
space, representation

The space of all possible state and action pairs considered by the reinforcement-learning agent.

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
tamer framework
framework

A framework for shaping an agent's behavior through approval and disapproval signals from humans.

Aliases: tamer

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review
tamer+rl
algorithm, method

A reinforcement-learning variant that combines human feedback with reward from a Markov decision process.

박진우 (dztg5apj7m) extractionB (s683577b42) reviewAnonymous (12632b8b5f) reviewAK (4715169a40) review

REFERENCES

An Introduction to Reinforcement Learning
2013cited by this paper
Augmented Reinforcement Learning for Interaction with Non-expert Humans in Agent Domains
2011cited by this paper
Integrating reinforcement learning with human demonstrations of varying ability
2011cited by this paper
Learning Options through Human Interaction
2011cited by this paper
Dynamic Reward Shaping: Training a Robot by Voice
2010cited by this paper
Reinforcement Learning Via Practice and Critique Advice
2010cited by this paper
Combining manual feedback with subsequent MDP reward signals for reinforcement learning
2010influential reference
RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments
2009cited by this paper
Interactively shaping agents via human reinforcement: the TAMER framework
2009influential reference
Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance
2006cited by this paper
Probabilistic policy reuse in a reinforcement learning agent
2006cited by this paper
Heuristically Accelerated Q-Learning: A New Approach to Speed Up Reinforcement Learning
2004cited by this paper
Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer
2004cited by this paper
Accelerating Reinforcement Learning through Implicit Imitation
2003cited by this paper
Principled Methods for Advising Reinforcement Learning Agents
2003influential reference
RIBDB: An SRS Based Infrastructure for REALIS
2002cited by this paper
Practical Reinforcement Learning in Continuous Spaces
2000cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Reward Functions for Accelerated Learning
1994cited by this paper
Robot Shaping: Developing Autonomous Agents Through Learning
1994cited by this paper
Robot shaping: developing situated agents through learning
1992cited by this paper
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Automatic State Abstraction from Demonstration ∗
year unknowncited by this paper

CITED BY

Cross-Scenario Validation of a DQN-LSTM Combined Method for Satellite Orbital Maneuver Detection
2026cites this paper
Active Query Selection for Crowd-Based Reinforcement Learning
2025cites this paper
A relevance model of human sparse communication in cooperation
2025cites this paper
Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control
2025influential citation
Can I Trust You?—Handling Unreliable Human Action Advice in Interactive Reinforcement Learning
2025influential citation
Safe Explicable Policy Search
2025cites this paper
Application and Performance Analysis of MDQN in Microgrid Optimal Scheduling
2025cites this paper
RL-EAR: reinforcement learning-based energy-aware routing for software-defined wireless sensor network
2025cites this paper
Continuous alignment of multi-target preferences via instructed diffusion model
2025cites this paper
A Survey on Human Preference Learning for Aligning Large Language Models
2025cites this paper
Multi-UAV-UGV Collision-Free Tracking Control via Control Barrier Function-Based Reinforcement Learning
2025cites this paper
Reinforcement Learning with Human Feedback: A CartPole Case Study
2024cites this paper
Learning-based personalisation of robot behaviour for robot-assisted therapy
2024cites this paper
Socially Assistive Robots for patients with Alzheimer's Disease: A scoping review.
2024cites this paper
Taking Training Seriously: Human Guidance and Management-Based Regulation of Artificial Intelligence
2024cites this paper
Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement
2024cites this paper
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
2024cites this paper
Merging on a Parallel-Type Entrance Ramp Using Hybrid-Action Reinforcement Learning
2024cites this paper
Interactive Reinforcement Learning from Natural Language Feedback
2024cites this paper
Grasp Planning in Manufacturing with NAO Robot Using Reinforcement Learning
2024cites this paper
Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models
2024cites this paper
Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards
2024cites this paper
Purposeful Regularization with Reinforcement Learning for Facial Expression Recognition In-the-Wild
2024cites this paper
Learning to Schedule Resistant to Adversarial Attacks in Diffusion Probabilistic Models Under the Threat of Lipschitz Singularities
2024cites this paper
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
2024cites this paper
A Survey on Human Preference Learning for Large Language Models
2024cites this paper
Opinion-Guided Reinforcement Learning
2024cites this paper
Primitive Skill-Based Robot Learning from Human Evaluative Feedback
2023cites this paper
Human-AI collaboration in real-world complex environment with reinforcement learning
2023cites this paper
Fostering human learning in sequential decision-making: Understanding the role of evaluative feedback
2023cites this paper
State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
2023cites this paper
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
2023cites this paper
Using Learning Curve Predictions to Learn from Incorrect Feedback
2023cites this paper
Reinforcement Learning Requires Human-in-the-Loop Framing and Approaches
2023cites this paper
Enhancing Reinforcement Learning Agents with Local Guides
2023influential citation
Bandit-Based Policy Invariant Explicit Shaping for Incorporating External Advice in Reinforcement Learning
2023cites this paper
Devs Model Construction As A Reinforcement Learning Problem
2022cites this paper
Dual Representation for Human-in-the-loop Robot Learning
2022cites this paper
Building Assistive Sensorimotor Interfaces through Human-in-the-Loop Machine Learning
2022cites this paper
Trust identification through cognitive correlates with emphasizing attention in cloud robotics
2022cites this paper
Communicative capital: a key resource for human–machine shared agency and collaborative capacity
2022influential citation
Can Humans Be out of the Loop?
2022cites this paper
Reducing Computational Cost During Robot Navigation and Human–Robot Interaction with a Human-Inspired Reinforcement Learning Architecture
2022cites this paper
Learning from Unreliable Human Action Advice in Interactive Reinforcement Learning
2022influential citation
Learning Optimization for Dispatch of Interregional Power Grid under Uncertain Environment
2022cites this paper
A Dual Representation Framework for Robot Learning with Human Guidance
2022cites this paper
Recent advances in leveraging human guidance for sequential decision-making tasks
2021cites this paper
Policy Gradient Bayesian Robust Optimization for Imitation Learning
2021cites this paper
Richer Knowledge Transfer in Teacher Student Framework using State Categorization and Advice Replay
2021cites this paper
An empirical assessment of deep learning approaches to task-oriented dialog management
2021cites this paper
Influencing Reinforcement Learning through Natural Language Guidance
2021cites this paper
Improving reinforcement learning with human assistance: an argument for human subject studies with HIPPO Gym
2021cites this paper
Let's Do the Time Warp Again: Human Action Assistance for Reinforcement Learning Agents
2021cites this paper
Accelerating the Learning of TAMER with Counterfactual Explanations
2021cites this paper
Towards interactive reinforcement learning with intrinsic feedback
2021cites this paper
Towards Intrinsic Interactive Reinforcement Learning: A Survey
2021influential citation
Policy invariant explicit shaping: an efficient alternative to reward shaping
2021cites this paper
Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills
2021cites this paper
Inference of Simulation Models in Digital Twins by Reinforcement Learning
2021cites this paper
Accelerating the Convergence of Human-in-the-Loop Reinforcement Learning with Counterfactual Explanations
2021cites this paper
Learning Quadruped Locomotion Policies with Reward Machines
2021cites this paper
Human-Augmented Prescriptive Analytics With Interactive Multi-Objective Reinforcement Learning
2021cites this paper
The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning
2020cites this paper
FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback
2020cites this paper
Coaching: accelerating reinforcement learning through human-assisted approach
2020cites this paper
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey
2020cites this paper
How to reduce computation time while sparing performance during robot navigation? A neuro-inspired architecture for autonomous shifting between model-based and model-free learning
2020cites this paper
Coping with the variability in humans reward during simulated human-robot interactions through the coordination of multiple learning strategies*
2020cites this paper
Interactive RL via Online Human Demonstrations
2020cites this paper
Reinforcement Learning With Human Advice: A Survey
2020influential citation
Human-in-the-loop RL with an EEG wearable headset: on effective use of brainwaves to accelerate learning
2020cites this paper
Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning
2020cites this paper
A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges
2020cites this paper
A conceptual framework for externally-influenced agents: an assisted reinforcement learning review
2020cites this paper
A Review on Interactive Reinforcement Learning From Human Social Feedback
2020cites this paper
Battlesnake Challenge: A Multi-agent Reinforcement Learning Playground with Human-in-the-loop
2020cites this paper
Maximizing BCI Human Feedback using Active Learning
2020cites this paper
Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI
2020cites this paper
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems
2020cites this paper
Meta-Reward Model Based on Trajectory Data with k-Nearest Neighbors Method
2020cites this paper
CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes
2020cites this paper
Reinforcement Learning Approaches in Social Robotics
2020cites this paper
Multi-Channel Interactive Reinforcement Learning for Sequential Tasks
2020cites this paper
Human Feedback as Action Assignment in Interactive Reinforcement Learning
2020cites this paper
SCOBO: Sparsity-Aware Comparison Oracle Based Optimization
2020cites this paper
Integrating an Observer in Interactive Reinforcement Learning to Learn Legible Trajectories
2020cites this paper
Learning synergies based in-hand manipulation with reward shaping
2020cites this paper
Integrating Machine Learning with Human Knowledge
2020cites this paper
A Survey of Planning and Learning in Games
2020cites this paper
Towards Preference Learning for Autonomous Ground Robot Navigation Tasks
2020cites this paper
Useful Policy Invariant Shaping from Arbitrary Advice
2020cites this paper
Robot Learning in Mixed Adversarial and Collaborative Settings
2020cites this paper
Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation
2020cites this paper
A One-bit, Comparison-Based Gradient Estimator
2020cites this paper
Accelerating decentralized reinforcement learning of complex individual behaviors
2019cites this paper
Simulating Emergent Properties of Human Driving Behavior Using Multi-Agent Reward Augmented Imitation Learning
2019cites this paper
Active Attention-Modified Policy Shaping: Socially Interactive Agents Track
2019cites this paper
RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration
2019influential citation
SAIL: Simulation-Informed Active In-the-Wild Learning
2019cites this paper
Learning from Human Feedback: A Comparison of Interactive Reinforcement Learning Algorithms
2019influential citation