Delay-Adaptive Learning in Generalized Linear Contextual Bandits

Published 2020 in Mathematics of Operations Research

ABSTRACT

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision maker only after some delay, which is unknown and stochastic. Such delayed feedback occurs in several active learning settings, including product recommendation, personalized medical treatment selection, bidding in first-price auctions, and bond trading in over-the-counter markets. We study the performance of two well-known algorithms adapted to this delayed setting: one based on upper confidence bounds and the other based on Thompson sampling. We describe modifications on how these two algorithms should be adapted to handle delays and give regret characterizations for both algorithms. To the best of our knowledge, our regret bounds provide the first theoretical characterizations in generalized linear contextual bandits with large delays. Our results contribute to the broad landscape of contextual bandits literature by establishing that both algorithms can be made to be robust to delays, thereby helping clarify and reaffirm the empirical success of these two algorithms, which are widely deployed in modern recommendation engines. Funding: This work was supported by the National Science Foundation [Grants 2118199, 1915967, and CCF-2106508], the Air Force Office of Scientific Research [Award FA9550-20-1-0397], a Digital Twin research grant from Bain & Company, and a faculty research grant from New York University’s Center for Global Economy and Business.

PUBLICATION RECORD

Publication year
2020
Venue
Mathematics of Operations Research
Publication date
2020-03-11
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1287/moor.2023.1358 arXiv 2003.05174
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9
2020cited by this paper
Online Decision Making with High-Dimensional Covariates
2020cited by this paper
Learning in Generalized Linear Contextual Bandits with Stochastic Delays
2019influential reference
Deep Learning with Logged Bandit Feedback
2018cited by this paper
Offline Multi-Action Policy Learning: Generalization and Optimization
2018cited by this paper
Bernstein's inequalities for general Markov chains
2018cited by this paper
Big Data and the Precision Medicine Revolution
2018cited by this paper
Thompson Sampling for the MNL-Bandit
2017cited by this paper
Efficient Policy Learning
2017cited by this paper
Bandits with Delayed, Aggregated Anonymous Feedback
2017cited by this paper
Provable Optimal Algorithms for Generalized Linear Contextual Bandits
2017influential reference
Scalable Generalized Linear Bandits: Online Computation and Hashing
2017cited by this paper
Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments
2017cited by this paper
Asymptotic Convergence in Online Learning with Unbounded Delays
2016cited by this paper
Off-policy evaluation for slate recommendation
2016cited by this paper
Delay and Cooperation in Nonstochastic Bandits
2016cited by this paper
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives
2015cited by this paper
Online Learning with Adversarial Delays
2015cited by this paper
The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits
2015cited by this paper
Batch learning from logged bandit feedback through counterfactual risk minimization
2015cited by this paper
Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice
2015cited by this paper
Exact value for subgaussian norm of centered indicator random variable
2014cited by this paper
Modeling delayed feedback in display advertising
2014influential reference
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
2014cited by this paper
An Information-Theoretic Analysis of Thompson Sampling
2014cited by this paper
Learning to Optimize via Posterior Sampling
2013cited by this paper
Online Learning under Delayed Feedback
2013influential reference
Further Optimal Regret Bounds for Thompson Sampling
2012cited by this paper
Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization
2012cited by this paper
Thompson Sampling for Contextual Bandits with Linear Payoffs
2012cited by this paper
Efficient Optimal Learning for Contextual Bandits
2011cited by this paper
A Note on Performance Limitations in Bandit Problems With Side Information
2011cited by this paper
Doubly Robust Policy Evaluation and Learning
2011cited by this paper
An Empirical Evaluation of Thompson Sampling
2011cited by this paper
Contextual Bandits with Linear Payoff Functions
2011cited by this paper
Adaptive Design Methods in Clinical Trials, Second Edition
2011cited by this paper
A contextual-bandit approach to personalized news article recommendation
2010cited by this paper
Online Markov Decision Processes Under Bandit Feedback
2010cited by this paper
Parametric Bandits: The Generalized Linear Case
2010influential reference
Introduction to the non-asymptotic analysis of random matrices
2010cited by this paper
Nonparametric Bandits with Covariates
2010cited by this paper
Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms
2009cited by this paper
On-line Learning with Delayed Label Feedback
2005cited by this paper
On delayed prediction of individual sequences
2002cited by this paper
Wireless commerce: marketing issues and possibilities
2001cited by this paper
Generalized Linear Models
2001cited by this paper
Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs
1999cited by this paper
Generalized Linear Models
1984cited by this paper
Conjugate Priors for Exponential Families
1979cited by this paper

CITED BY

Inventory-constrained online learning for revenue management with delayed feedback
2026cites this paper
Neural Contextual Bandits Under Delayed Feedback Constraints
2025cites this paper
Contextual Linear Bandits with Delay as Payoff
2025cites this paper
High-dimensional Nonparametric Contextual Bandit Problem
2025cites this paper
Dynamic Care Unit Placements Under Unknown Demand with Learning
2025cites this paper
Addressing Signal Delay in Deep Reinforcement Learning
2024cites this paper
Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing
2022cites this paper
Delayed Feedback in Generalised Linear Bandits Revisited
2022cites this paper