Scaling life-long off-policy learning

Published 2012 in International Conference on Development and Learning

ABSTRACT

In this paper we pursue an approach to scaling life-long learning using parallel off-policy reinforcement learning algorithms. In life-long learning a robot continually learns from a life-time of experience, slowly acquiring and applying skills and knowledge to new situations. Many of the benefits of life-long learning are a results of scaling the amount of training data, processed by the robot, to long sensorimotor streams. Another dimension of scaling can be added by allowing off-policy sampling from the unending stream of sensorimotor data generated by a long-lived robot. Recent algorithmic developments have made it possible to apply off-policy algorithms to life-long learning, in a sound way, for the first time. We assess the scalability of these off-policy algorithms on a physical robot. We show that hundreds of accurate multi-step predictions can be learned about several policies in parallel and in realtime. We present the first online measures of off-policy learning progress. Finally we demonstrate that our robot, using the new off-policy measures, can learn 8000 predictions about 300 distinct policies, a substantial increase in scale compared to previous simulated and robotic life-long learning systems.

PUBLICATION RECORD

Publication year
2012
Venue
International Conference on Development and Learning
Publication date
2012-06-27
Fields of study
Computer Science
Identifiers
DOI 10.1109/DevLrn.2012.6400860 arXiv 1206.6262
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Dynamic switching and real-time machine learning for improved human control of assistive biomedical robots
2012cited by this paper
Learning to Make Predictions In Partially Observable Environments Without a Generative Model
2011cited by this paper
Gradient temporal-difference learning algorithms
2011influential reference
Multi-timescale nexting in a reinforcement learning robot
2011cited by this paper
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
2011influential reference
Scaling Up Machine Learning: Parallel Online Learning
2011cited by this paper
The Fixed Points of Off-Policy TD
2011cited by this paper
An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems
2011cited by this paper
Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective
2010cited by this paper
Multitask Learning without Label Correspondences
2010cited by this paper
Fast gradient-descent methods for temporal-difference learning with linear function approximation
2009cited by this paper
Highly scalable appearance-only SLAM - FAB-MAP 2.0
2009cited by this paper
Temporal Abstraction in Temporal-difference Networks
2005cited by this paper
Robot learning from demonstration
2004cited by this paper
Intrinsically Motivated Reinforcement Learning
2004cited by this paper
Temporal-Difference Networks
2004cited by this paper
A New Approach to Linear Filtering and Prediction Problems
2002cited by this paper
Probabilistic robotics
2002cited by this paper
Predictive Representations of State
2001cited by this paper
Temporal Abstraction
2000cited by this paper
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
1999cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Residual Algorithms: Reinforcement Learning with Function Approximation
1995cited by this paper
Lifelong robot learning
1993cited by this paper
A possibility for implementing curiosity and boredom in model-building neural controllers
1991cited by this paper
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 Intrinsic Motivation Systems for Autonomous Mental Development
year unknowncited by this paper
Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics
year unknowncited by this paper

CITED BY

Loosely consistent emphatic temporal-difference learning
2023cites this paper
The Emphatic Approach to Average-Reward Policy Evaluation
2022cites this paper
Trajectory Tracking Control of Autonomous Vehicles Based on Reinforcement Learning and Curvature Feedforward
2022cites this paper
Adapting Behaviour via Intrinsic Reward
2021cites this paper
Robot perceptual adaptation to environment changes for long-term human teammate following
2020cites this paper
Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study
2020cites this paper
Transfer Learning in Attack Avoidance Games
2020cites this paper
Fast Adaptation via Policy-Dynamics Value Functions
2020cites this paper
Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study
2019cites this paper
Self-Organizing Maps as a Storage and Transfer Mechanism in Reinforcement Learning
2018cites this paper
Self-organizing maps for storage and transfer of knowledge in reinforcement learning
2018cites this paper
Identification and off-policy learning of multiple objectives using adaptive clustering
2017cites this paper
Experience Replay Using Transition Sequences
2017cites this paper
Multi-step Off-policy Learning Without Importance Sampling Ratios
2017cites this paper
Lifelong learning for disturbance rejection on mobile robots
2016cites this paper
Experience-Based Generation of Maintenance and Achievement Goals on a Mobile Robot
2016cites this paper
Learnable knowledge for autonomous agents
2015cites this paper
The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning
2015cites this paper
DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE
2015cites this paper
Off-Policy General Value Functions to Represent Dynamic Role Assignments in RoboCup 3D Soccer Simulation
2014cites this paper
Surprise and Curiosity for Big Data Robotics
2014cites this paper
Prediction Driven Behavior: Learning Predictions that Drive Fixed Responses
2014cites this paper
Predictive Hebbian association of time-delayed inputs with actions in a developmental robot platform
2014cites this paper
Dynamic role assignment using general value functions
2013cites this paper
Multi-timescale nexting in a reinforcement learning robot
2011cites this paper