Efficient Optimal Learning for Contextual Bandits

Miroslav Dudík,Daniel J. Hsu,Satyen Kale,Nikos Karampatziakis,J. Langford,L. Reyzin,Tong Zhang

Published 2011 in Conference on Uncertainty in Artificial Intelligence

ABSTRACT

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work.

PUBLICATION RECORD

Publication year
2011
Venue
Conference on Uncertainty in Artificial Intelligence
Publication date
2011-06-13
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1106.2369
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Contextual Bandit Algorithms with Supervised Learning Guarantees
2010cited by this paper
An Optimal High Probability Algorithm for the Contextual Bandit Problem
2010cited by this paper
Error-Correcting Tournaments
2009cited by this paper
Slow Learners are Fast
2009cited by this paper
The Epoch-Greedy algorithm for contextual multi-armed bandits
2007influential reference
Adaptive Online Gradient Descent
2007cited by this paper
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems ∗
2006cited by this paper
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
2006cited by this paper
Efficient algorithms for online decision problems
2005cited by this paper
From Batch to Transductive Online Learning
2005cited by this paper
Eecient Algorithms for Online Decision Problems
2003cited by this paper
Using Confidence Bounds for Exploitation-Exploration Trade-offs
2003cited by this paper
Finite-time Analysis of the Multiarmed Bandit Problem
2002influential reference
The Nonstochastic Multiarmed Bandit Problem
2002influential reference
A decision-theoretic generalization of on-line learning and an application to boosting
1997influential reference
Asymptotically efficient adaptive allocation rules
1985cited by this paper
On Tail Probabilities for Martingales
1975cited by this paper
On general minimax theorems
1958influential reference
Asymptotically Efficient Adaptive Allocation Rules
year unknowncited by this paper

CITED BY

Group-realizable multi-group learning by minimizing empirical risk
2026cites this paper
Optimal Regret for Policy Optimization in Contextual Bandits
2026cites this paper
A Simple Reduction Scheme for Constrained Contextual Bandits with Adversarial Contexts via Regression
2026cites this paper
Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits
2026cites this paper
Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback
2025cites this paper
Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback
2025cites this paper
The Minimal Search Space for Conditional Causal Bandits
2025cites this paper
Contextual Linear Bandits with Delay as Payoff
2025cites this paper
Practical Contextual Bandits for Large-Scale Structured Discrete Constrained Optimization Problems
2025cites this paper
No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!
2025cites this paper
Learning to Price with Resource Constraints: From Full Information to Machine-Learned Prices
2025cites this paper
From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards
2025cites this paper
The Real Price of Bandit Information in Multiclass Classification
2024cites this paper
Efficient Contextual Bandits with Uninformed Feedback Graphs
2024cites this paper
Data-driven Error Estimation: Excess Risk Bounds without Class Complexity as Input
2024cites this paper
Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff
2024cites this paper
Delay as Payoff in MAB
2024cites this paper
Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays
2024cites this paper
An Information Theoretic Approach to Interaction-Grounded Learning
2024cites this paper
How Does Variance Shape the Regret in Contextual Bandits?
2024cites this paper
Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching
2024cites this paper
Contextual Bandits for Unbounded Context Distributions
2024cites this paper
Stochastic Multi-Armed Bandits with Strongly Reward-Dependent Delays
2024cites this paper
Optimizing contextual bandit hyperparameters: A dynamic transfer learning-based framework
2024cites this paper
Stochastic Constrained Contextual Bandits via Lyapunov Optimization Based Estimation to Decision Framework
2024cites this paper
Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of Bandits
2024cites this paper
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
2023cites this paper
An Asymptotically Optimal Algorithm for the One-Dimensional Convex Hull Feasibility Problem
2023cites this paper
Accelerating exploration and representation learning with offline pre-training
2023cites this paper
Stochastic Contextual Bandits with Graph-based Contexts
2023cites this paper
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards
2023cites this paper
Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback
2023cites this paper
Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments
2023cites this paper
Data-Driven Online Recommender Systems With Costly Information Acquisition
2023cites this paper
Online Bidding in Repeated Non-Truthful Auctions under Budget and ROI Constraints
2023cites this paper
Remote Control of Bandits Over Queues - Relevance of Information Freshness
2023cites this paper
Federated Offline Policy Learning
2023cites this paper
An Asymptotically Optimal Algorithm for the Convex Hull Membership Problem
2023cites this paper
Adversarial Bandits With Multi-User Delayed Feedback: Theory and Application
2023cites this paper
A New Framework: Short-Term and Long-Term Returns in Stochastic Multi-Armed Bandit
2023cites this paper
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning
2023cites this paper
Graph Feedback via Reduction to Regression
2023cites this paper
Statistical complexity and optimal algorithms for nonlinear ridge bandits
2023cites this paper
Beyond UCB: Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits
2023cites this paper
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
2023cites this paper
An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits
2023cites this paper
Learning to bid and rank together in recommendation systems
2023cites this paper
Effective Dimension in Bandit Problems under Censorship
2023cites this paper
Delayed Feedback in Kernel Bandits
2023cites this paper
Online Learning in Contextual Second-Price Pay-Per-Click Auctions
2023cites this paper
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
2023cites this paper
Importance-Weighted Offline Learning Done Right
2023cites this paper
Efficient Online Clustering with Moving Costs
2023cites this paper
Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations
2023cites this paper
Tracking Most Significant Shifts in Nonparametric Contextual Bandits
2023cites this paper
Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
2023cites this paper
Adaptive experiments toward learning treatment effect heterogeneity
2023cites this paper
Practical Contextual Bandits with Feedback Graphs
2023cites this paper
Online Learning under Budget and ROI Constraints and Applications to Bidding in Non-Truthful Auctions
2023cites this paper
Context-lumpable stochastic bandits
2023cites this paper
Efficient Reinforcement Learning With Impaired Observability: Learning to Act With Delayed and Missing State Observations
2023cites this paper
Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
2023cites this paper
An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge
2022cites this paper
Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts
2022cites this paper
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
2022cites this paper
Interaction-Grounded Learning with Action-inclusive Feedback
2022cites this paper
Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback
2022cites this paper
Best of Many Worlds Guarantees for Online Learning with Knapsacks
2022cites this paper
Online Caching Networks with Adversarial Guarantees
2022cites this paper
Dynamic path learning in decision trees using contextual bandits
2022cites this paper
Some performance considerations when using multi-armed bandit algorithms in the presence of missing data
2022cites this paper
Thompson Sampling with Unrestricted Delays
2022cites this paper
Online Learning with Knapsacks: the Best of Both Worlds
2022cites this paper
Contextual Decision-Making with Knapsacks Beyond the Worst Case
2022cites this paper
Partial Likelihood Thompson Sampling
2022influential citation
Low-Rank Representation of Reinforcement Learning Policies
2022cites this paper
On Efficient Online Imitation Learning via Classification
2022cites this paper
Making Decisions under Outcome Performativity
2022cites this paper
Adaptive Oracle-Efficient Online Learning
2022cites this paper
Delayed Feedback in Generalised Linear Bandits Revisited
2022cites this paper
Sales Channel Optimization via Simulations Based on Observational Data with Delayed Rewards: A Case Study at LinkedIn
2022cites this paper
Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles
2022influential citation
Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm
2022cites this paper
Instance-optimal PAC Algorithms for Contextual Bandits
2022influential citation
Adaptivity and Confounding in Multi-Armed Bandit Experiments
2022cites this paper
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracle
2022cites this paper
Human-Like Multimodal Perception and Purposeful Manipulation for Deformable Objects
2022cites this paper
Adaptive Experimentation in the Presence of Exogenous Nonstationary Variation
2022cites this paper
Contextual Bandits with Large Action Spaces: Made Practical
2022cites this paper
Adapting to misspecification in contextual bandits with offline regression oracles
2021cites this paper
Maillard Sampling: Boltzmann Exploration Done Optimally
2021cites this paper
Delayed Feedback in Episodic Reinforcement Learning
2021cites this paper
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
2021cites this paper
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously
2021cites this paper
Boosting for Online Convex Optimization
2021cites this paper
Adapting to Misspecification in Contextual Bandits
2021cites this paper
Sayer: Using Implicit Feedback to Improve System Policies
2021cites this paper
Optimism and Delays in Episodic Reinforcement Learning
2021influential citation
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
2021cites this paper
Online Caching Networks with Adversarial Guarantees
2021cites this paper