Provably Efficient Imitation Learning from Observation Alone

Wen Sun,Anirudh Vemula,Byron Boots,J. Bagnell

Published 2019 in International Conference on Machine Learning

ABSTRACT

We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also investigate the extension of FAIL in a model-based setting. Finally we demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks.

PUBLICATION RECORD

Publication year
2019
Venue
International Conference on Machine Learning
Publication date
2019-05-24
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1905.10948
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Imitation Learning from Observation
2019cited by this paper
Information-Theoretic Considerations in Batch Reinforcement Learning
2019cited by this paper
Behavioral Cloning from Observation
2018cited by this paper
Imitating Latent Policies from Observation
2018cited by this paper
Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning
2018cited by this paper
Model-Based Reinforcement Learning in Contextual Decision Processes
2018cited by this paper
Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches
2018cited by this paper
Generalization and Equilibrium in Generative Adversarial Nets (GANs)
2017cited by this paper
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
2017cited by this paper
Agile Autonomous Driving using End-to-End Deep Imitation Learning
2017cited by this paper
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
2017cited by this paper
Combining self-supervised learning and imitation for vision-based rope manipulation
2017cited by this paper
Deep Q-learning From Demonstrations
2017cited by this paper
Wasserstein GAN
2017cited by this paper
Mastering the game of Go with deep neural networks and tree search
2016cited by this paper
OpenAI Gym
2016cited by this paper
PAC Reinforcement Learning with Rich Observations
2016cited by this paper
Analysis of Classification-based Policy Iteration Algorithms
2016cited by this paper
Generative Adversarial Imitation Learning
2016cited by this paper
Contextual Decision Processes with low Bellman rank are PAC-Learnable
2016cited by this paper
Analysis of Classication-bas ed Policy Iteration Algorithms
2016cited by this paper
Improving Multi-Step Prediction of Learned Time Series Models
2015cited by this paper
Learning to Search for Dependencies
2015cited by this paper
Learning to Filter with Predictive State Inference Machines
2015cited by this paper
Trust Region Policy Optimization
2015cited by this paper
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
2015cited by this paper
Learning to Search Better than Your Teacher
2015influential reference
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
2014cited by this paper
Reinforcement and Imitation Learning via Interactive No-Regret Learning
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Online Learning and Online Convex Optimization
2012cited by this paper
On the empirical estimation of integral probability metrics
2012cited by this paper
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
2010influential reference
A Reduction from Apprenticeship Learning to Classification
2010cited by this paper
Efficient Reductions for Imitation Learning
2010cited by this paper
Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
2009cited by this paper
Search-based structured prediction
2009cited by this paper
Finite-Time Bounds for Fitted Value Iteration
2008cited by this paper
Error limiting reductions between classification tasks
2005cited by this paper
Error Bounds for Approximate Value Iteration
2005cited by this paper
Distance-Based Classification with Lipschitz Functions
2004cited by this paper
Exploration in Metric State Spaces
2003cited by this paper
Policy Search by Dynamic Programming
2003cited by this paper
Equivalence notions and model minimization in Markov decision processes
2003cited by this paper
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
2003cited by this paper
Approximately Optimal Approximate Reinforcement Learning
2002influential reference
Predicting Time Series with Support Vector Machines
1997cited by this paper
Integral Probability Metrics and Their Generating Classes of Functions
1997influential reference
Learning to predict by the methods of temporal differences
1988cited by this paper

CITED BY

CORIMP: A correlation-driven imputation approach for offline reinforcement learning with incomplete action data
2026cites this paper
Imitation Learning from a Single Temporally Misaligned Video
2025cites this paper
On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration
2025cites this paper
Model-based Imitation Learning from Observation for input estimation in monitored systems
2025cites this paper
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
2025cites this paper
Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
2025cites this paper
IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic
2025cites this paper
Controlling Large Language Model with Latent Actions
2025cites this paper
A Dual Approach to Imitation Learning from Observations with Offline Datasets
2024cites this paper
Hybrid Reinforcement Learning from Offline Observation Alone
2024influential citation
How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach
2024cites this paper
A Survey of Imitation Learning Methods, Environments and Metrics
2024influential citation
Adversarial Imitation Learning via Boosting
2024cites this paper
Tiny Reinforcement Learning for Quadruped Locomotion using Decision Transformers
2024cites this paper
A Model-Based Approach for Improving Reinforcement Learning Efficiency Leveraging Expert Observations
2024cites this paper
Imitation Learning: A Survey of Learning Methods, Environments and Metrics
2024influential citation
Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning
2024cites this paper
Dynamic Non-Prehensile Object Transport via Model-Predictive Reinforcement Learning
2024cites this paper
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
2024cites this paper
Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation
2024cites this paper
MILES: Making Imitation Learning Easy with Self-Supervision
2024cites this paper
Diffusing States and Matching Scores: A New Framework for Imitation Learning
2024cites this paper
Learning Human Behavior in Shared Control: Adaptive Inverse Differential Game Approach
2023cites this paper
Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms
2023cites this paper
Imitation-Guided Multimodal Policy Generation from Behaviourally Diverse Demonstrations
2023cites this paper
ILPO-MP: Mode Priors Prevent Mode Collapse when Imitating Latent Policies from Observations
2023cites this paper
What Matters to You? Towards Visual Representation Alignment for Robot Learning
2023cites this paper
Offline Imitation Learning with Variational Counterfactual Reasoning
2023cites this paper
Imitation Learning from Observation through Optimal Transport
2023cites this paper
Automated Action Evaluation for Robotic Imitation Learning via Siamese Neural Networks
2023cites this paper
Imitation Learning for Financial Applications
2023cites this paper
Learning non-Markovian Decision-Making from State-only Sequences
2023cites this paper
Identifiability and Generalizability in Constrained Inverse Reinforcement Learning
2023cites this paper
GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts
2023cites this paper
A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space
2023cites this paper
A posteriori control densities: Imitation learning from partial observations
2023cites this paper
Human-in-the-Loop Behavior Modeling via an Integral Concurrent Adaptive Inverse Reinforcement Learning
2023cites this paper
Inverse Reinforcement Learning without Reinforcement Learning
2023cites this paper
Learning Stabilization Control from Observations by Learning Lyapunov-like Proxy Models
2023cites this paper
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
2023cites this paper
Information-theoretic policy learning from partial observations with fully informed decision makers
2022cites this paper
Causal Imitation Learning under Temporally Correlated Noise
2022cites this paper
A Ranking Game for Imitation Learning
2022cites this paper
Offline Reinforcement Learning with Realizability and Single-policy Concentrability
2022cites this paper
Imitation learning by state-only distribution matching
2022cites this paper
LobsDICE: Offline Imitation Learning from Observation via Stationary Distribution Correction Estimation
2022cites this paper
Online Learning Human Behavior for a Class of Human-in-the-Loop Systems via Adaptive Inverse Optimal Control
2022cites this paper
SelfD: Self-Learning Large-Scale Driving Policies From the Web
2022cites this paper
Imitation Learning from Observations under Transition Model Disparity
2022cites this paper
Model-based Offline Imitation Learning with Non-expert Data
2022cites this paper
Towards Uniformly Superhuman Autonomy via Subdominance Minimization
2022cites this paper
Improved Policy Optimization for Online Imitation Learning
2022cites this paper
Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis
2022cites this paper
Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations
2022cites this paper
State Advantage Weighting for Offline RL
2022cites this paper
Model Predictive Control via On-Policy Imitation Learning
2022cites this paper
LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
2022cites this paper
Out-of-Dynamics Imitation Learning from Multimodal Demonstrations
2022cites this paper
Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward
2022cites this paper
On Efficient Online Imitation Learning via Classification
2022cites this paper
Feedback in Imitation Learning: Confusion on Causality and Covariate Shift
2021influential citation
On the Value of Interaction and Function Approximation in Imitation Learning
2021cites this paper
Hierarchically Integrated Models: Learning to Navigate from Heterogeneous Robots
2021cites this paper
Learning Feasibility to Imitate Demonstrators with Different Dynamics
2021cites this paper
Imitation Learning: Progress, Taxonomies and Challenges
2021cites this paper
Imitation Learning by Reinforcement Learning
2021cites this paper
Recent advances in leveraging human guidance for sequential decision-making tasks
2021cites this paper
A Simple Reward-free Approach to Constrained Reinforcement Learning
2021cites this paper
Imitation by Predicting Observations
2021cites this paper
T OWARDS T EACHING M ACHINES WITH L ANGUAGE : I NTERACTIVE L EARNING FROM O NLY L ANGUAGE D E - SCRIPTIONS OF A CTIVITIES
2021cites this paper
Multi-Robot Deep Reinforcement Learning for Mobile Navigation
2021cites this paper
Imitation Learning: Progress, Taxonomies and Opportunities
2021cites this paper
MobILE: Model-Based Imitation Learning From Observation Alone
2021influential citation
Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap
2021cites this paper
XIRL: Cross-embodiment Inverse Reinforcement Learning
2021cites this paper
Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
2021cites this paper
Multi-Agent Reinforcement Learning-Based Resource Management for End-to-End Network Slicing
2021cites this paper
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation
2021cites this paper
Learning From Imperfect Demonstrations From Agents With Varying Dynamics
2021cites this paper
Off-Policy Imitation Learning from Observations
2021cites this paper
Of Moments and Matching: Trade-offs and Treatments in Imitation Learning
2021cites this paper
Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally
2021cites this paper
Optimism is All You Need: Model-Based Imitation Learning From Observation Alone
2021influential citation
Interactive Learning from Activity Description
2021cites this paper
Feedback in Imitation Learning: The Three Regimes of Covariate Shift
2021influential citation
Provably Efficient Model-based Policy Adaptation
2020cites this paper
Multi-Robot Deep Reinforcement Learning via Hierarchically Integrated Models
2020cites this paper
Robust Imitation Learning from Noisy Demonstrations
2020cites this paper
Toward the Fundamental Limits of Imitation Learning
2020cites this paper
Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch
2020cites this paper
Constrained episodic reinforcement learning in concave-convex and knapsack settings
2020cites this paper
Energy-Based Imitation Learning
2020cites this paper
State-Only Imitation Learning for Dexterous Manipulation
2020cites this paper
Reparameterized Variational Divergence Minimization for Stable Imitation
2020influential citation
Provably Efficient Third-Person Imitation from Offline Observation
2020influential citation
State-only Imitation with Transition Dynamics Mismatch
2020cites this paper
Provable Representation Learning for Imitation Learning via Bi-level Optimization
2020influential citation
Estimating Q(s, s') with Deep Deterministic Dynamics Gradients
2020cites this paper
Learning Internal State Memory Representations from Observation
2019influential citation
Imitation Learning as f-Divergence Minimization
2019cites this paper