A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Published 2010 in International Conference on Artificial Intelligence and Statistics

ABSTRACT

Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

PUBLICATION RECORD

Publication year
2010
Venue
International Conference on Artificial Intelligence and Statistics
Publication date
2010-11-02
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1011.0686
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Interactive Policy Learning through Confidence-Based Autonomy
2014cited by this paper
Efficient Reductions for Imitation Learning
2010influential reference
Mario AI competition
2009cited by this paper
Search-based structured prediction
2009influential reference
Fast Rates for Regularized Objectives
2008cited by this paper
On the Generalization Ability of Online Strongly Convex Programming Algorithms
2008influential reference
High Performance Outdoor Navigation from Overhead Data using Imitation Learning
2008cited by this paper
Mind the Duality Gap: Logarithmic regret algorithms for online optimization
2008cited by this paper
Subgradient Methods for Structured Prediction
2007cited by this paper
Online) Subgradient Methods for Structured Prediction
2007influential reference
Logarithmic regret algorithms for online convex optimization
2006cited by this paper
Boosting Structured Prediction for Imitation Learning
2006cited by this paper
Error limiting reductions between classification tasks
2005influential reference
Apprenticeship learning via inverse reinforcement learning
2004influential reference
Max-Margin Markov Networks
2003influential reference
Approximately Optimal Approximate Reinforcement Learning
2002cited by this paper
On the generalization ability of on-line learning algorithms
2001influential reference
Is imitation learning the route to humanoid robots?
1999cited by this paper
Robotics and autonomous systems
1988cited by this paper

CITED BY

Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
2026cites this paper
Human Preference Modeling Using Visual Motion Prediction Improves Robot Skill Learning from Egocentric Human Video
2026cites this paper
Collision-Free Humanoid Traversal in Cluttered Indoor Scenes
2026cites this paper
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning
2026cites this paper
VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation
2026cites this paper
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
2026cites this paper
Token-Level LLM Collaboration via FusionRoute
2026cites this paper
Physics-informed embodied intelligence in the foundation model era: Advancing robot manipulation for smart manufacturing
2026cites this paper
An Observation Feature Study of Robot Imitation Learning for Autonomous Social Navigation on Construction Sites
2026cites this paper
ForSim: Stepwise Forward Simulation for Traffic Policy Fine-Tuning
2026cites this paper
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions
2026cites this paper
Why Look at It at All?: Vision-Free Multifingered Blind Grasping Using Uniaxial Fingertip Force Sensing
2026cites this paper
Differentiate-and-Inject: Enhancing VLAs via Functional Differentiation Induced by In-Parameter Structural Reasoning
2026cites this paper
Geometry of Uncertainty: Learning Metric Spaces for Multimodal State Estimation in RL
2026cites this paper
RAP: Role-Aware Joint Prediction and Planning in Autonomous Driving
2026cites this paper
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
2026cites this paper
Large language model guided deep reinforcement learning for safe autonomous vehicle decision making
2026cites this paper
Press Start to Charge: Videogaming the Online Centralized Charging Scheduling Problem
2026cites this paper
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
2026cites this paper
Information Filtering via Variational Regularization for Robot Manipulation
2026cites this paper
Theoretical Challenges in Learning for Branch-and-Cut
2026cites this paper
World-Gymnast: Training Robots with Reinforcement Learning in a World Model
2026cites this paper
TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT
2026cites this paper
InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons
2026cites this paper
Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding
2026cites this paper
Object search strategy for service robots with knowledge-based viewpoint selection and hierarchical action decisions
2026cites this paper
Learning Human-Like Badminton Skills for Humanoid Robots
2026cites this paper
MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation
2026cites this paper
APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots
2026cites this paper
ExtremControl: Low-Latency Humanoid Teleoperation with Direct Extremity Control
2026cites this paper
A Survey on Deep Generative Models for Robot Learning From Multimodal Demonstrations
2026cites this paper
Adversarial traffic scene generation considering harm, rarity, and ambiguity for autonomous driving testing
2026cites this paper
Imitation from Observations with Trajectory-Level Generative Embeddings
2026cites this paper
Reinforcement learning in robotic systems : A review on sim-to-real transfer
2026cites this paper
Imitation Learning for Combinatorial Optimisation under Uncertainty
2026cites this paper
Teaching Robots Like Dogs: Learning Agile Navigation from Luring, Gesture, and Speech
2026influential citation
Vision-driven river following of UAV via safe reinforcement learning using semantic dynamics model
2026cites this paper
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
2026cites this paper
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
2026influential citation
Fauna Sprout: A lightweight, approachable, developer-ready humanoid robot
2026cites this paper
Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning
2026cites this paper
PocketDP3: Efficient Pocket-Scale 3D Visuomotor Policy
2026cites this paper
Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation
2026cites this paper
Causal Imitation Learning Under Measurement Error and Distribution Shift
2026influential citation
Expanding the Capabilities of Reinforcement Learning via Text Feedback
2026cites this paper
Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration
2026cites this paper
Robust Intervention Learning from Emergency Stop Interventions
2026cites this paper
Maximum Likelihood Reinforcement Learning
2026cites this paper
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models
2026cites this paper
SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models
2026cites this paper
AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction
2026cites this paper
ODPL: An objective-driven decomposed policy learning framework for urban autonomous driving control
2026cites this paper
Nipping the Drift in the Bud: Retrospective Rectification for Robust Vision-Language Navigation
2026influential citation
Deep learning-based robotic cloth manipulation applications: systematic review, challenges and opportunities for physical AI.
2026cites this paper
Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning
2026cites this paper
Vision-Guided Outdoor Flight and Obstacle Evasion via Reinforcement Learning
2026cites this paper
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
2026cites this paper
$\chi_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies
2026influential citation
Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
2026cites this paper
Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation
2026cites this paper
RISE: Self-Improving Robot Policy with Compositional World Model
2026cites this paper
Synthesis of model predictive control and reinforcement learning: Survey and classification
2026cites this paper
Playback Optimization for 360° Live Streaming via Bitrate Estimation and Download Scheduling
2026cites this paper
Imitation and Exploration: Learning for Vision-Based Communication-Free Multi-UAV Coordination in Cluttered Environments
2026cites this paper
PlanCiLQR: Hierarchical trajectory planning for autonomous driving using constrained iterative linear quadratic regulator
2026cites this paper
End-to-End Autonomous Driving: From Classic Paradigm to Large Model Empowerment—A Comprehensive Survey
2026cites this paper
Enhancing Hierarchical Reinforcement Learning With Symbolic Planning for Long-Horizon Tasks
2026cites this paper
Robot policy learning from demonstrations and visual rewards for sequential manipulation tasks
2026cites this paper
AI Agent Systems: Architectures, Applications, and Evaluation
2026cites this paper
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models
2026cites this paper
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
2026cites this paper
Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning
2026cites this paper
Stable On-Policy Distillation through Adaptive Target Reformulation
2026cites this paper
Heterogeneous Multi-Expert Reinforcement Learning for Long-Horizon Multi-Goal Tasks in Autonomous Forklifts
2026cites this paper
DexRepNet++: Learning Dexterous Robotic Manipulation With Geometric and Spatial Hand-Object Representations
2026cites this paper
The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents
2026cites this paper
Offline inverse reinforcement learning for joint optimization of energy costs and demand charge in industrial PV-battery load systems
2026cites this paper
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
2026cites this paper
Point Bridge: 3D Representations for Cross Domain Policy Learning
2026cites this paper
Deep Reinforcement Learning in Software Defined Networking: A Survey, Research Challenges, and Future Perspectives
2026cites this paper
\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation
2026cites this paper
Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation
2026cites this paper
Self-Distillation Enables Continual Learning
2026influential citation
APC-RL: Exceeding Data-Driven Behavior Priors with Adaptive Policy Composition
2026cites this paper
Reinforcement Learning via Self-Distillation
2026cites this paper
OVD: On-policy Verbal Distillation
2026cites this paper
Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation Learning
2026cites this paper
Toward Fully Autonomous Driving: AI, Challenges, Opportunities, and Needs
2026cites this paper
Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment
2026cites this paper
RoboStriker: Hierarchical Decision-Making for Autonomous Humanoid Boxing
2026cites this paper
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
2026cites this paper
IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models
2026cites this paper
FluxNet: Learning Capacity-Constrained Local Transport Operators for Conservative and Bounded PDE Surrogates
2026cites this paper
Didactic to Constructive: Turning Expert Solutions into Learnable Reasoning
2026cites this paper
RPL: Learning Robust Humanoid Perceptive Locomotion on Challenging Terrains
2026cites this paper
Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control
2026cites this paper
Language Movement Primitives: Grounding Language Models in Robot Motion
2026cites this paper
Self-supervised Physics-Informed Manipulation of Deformable Linear Objects with Non-negligible Dynamics
2026cites this paper
VLS: Steering Pretrained Robot Policies via Vision-Language Models
2026cites this paper
HoloBrain-0 Technical Report
2026cites this paper