Online Learning under Delayed Feedback

Pooria Joulani,A. György,Csaba Szepesvari

Published 2013 in International Conference on Machine Learning

ABSTRACT

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.

PUBLICATION RECORD

Publication year
2013
Venue
International Conference on Machine Learning
Publication date
2013-06-04
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.14288/1.0044651 arXiv 1306.0686
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization
2012cited by this paper
The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
2011influential reference
Efficient Optimal Learning for Contextual Bandits
2011cited by this paper
Distributed delayed stochastic optimization
2011cited by this paper
A contextual-bandit approach to personalized news article recommendation
2010cited by this paper
Online Markov Decision Processes Under Bandit Feedback
2010cited by this paper
Slow Learners are Fast
2009cited by this paper
Improving on-line learning
2007cited by this paper
On inequalities for sums of bounded random variables
2006cited by this paper
Prediction, learning, and games
2006cited by this paper
On-line Learning with Delayed Label Feedback
2005cited by this paper
Finite-time Analysis of the Multiarmed Bandit Problem
2002cited by this paper
On delayed prediction of individual sequences
2002influential reference
Probability Inequalities for Sums of Bounded Random Variables
1994cited by this paper
Probability inequalities for sum of bounded random variables
1963cited by this paper
Stochastic processes
1953influential reference
The Theory of the Riemann Zeta-Functions
1952cited by this paper

CITED BY

Decentralized Online Convex Optimization with Unknown Feedback Delays
2026cites this paper
Parameter-free Dynamic Regret: Time-varying Movement Costs, Delayed Feedback, and Memory
2026cites this paper
Linear Convergence in Games with Delayed Feedback via Extra Prediction
2026cites this paper
Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach
2026cites this paper
Inventory-constrained online learning for revenue management with delayed feedback
2026cites this paper
Statistical Learning from Attribution Sets
2026cites this paper
A Reduction from Delayed to Immediate Feedback for Online Convex Optimization with Improved Guarantees
2026cites this paper
Integrating Multi-Armed Bandit, Active Learning, and Distributed Computing for Scalable Optimization
2026cites this paper
Learning to Balance Utility and Delay in Bipartite Queueing Networks With Sample Path Constraints
2026cites this paper
Distributed Perceptron under Bounded Staleness, Partial Participation, and Noisy Communication
2026cites this paper
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
2025influential citation
Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs
2025cites this paper
Bandit and Delayed Feedback in Online Structured Prediction
2025influential citation
Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback
2025influential citation
Lipschitz Bandits with Stochastic Delayed Feedback
2025influential citation
Optimistic Learning for Communication Networks
2025cites this paper
Contextual Relevance and Adaptive Sampling for LLM-Based Document Reranking
2025cites this paper
Multi-agent Adaptive Mechanism Design
2025cites this paper
Neural Contextual Bandits Under Delayed Feedback Constraints
2025cites this paper
Missing Data Multiple Imputation for Tabular Q-Learning in Online RL
2025cites this paper
Learning from Delayed Feedback in Games via Extra Prediction
2025cites this paper
Learning-Augmented Control: Adaptively Confidence Learning for Competitive MPC
2025cites this paper
Multi-player Multi-armed Bandits with Delayed Feedback
2025cites this paper
On the constrained online convex optimization with feedback delay
2025cites this paper
Revisiting Multi-Agent Asynchronous Online Optimization with Delays: the Strongly Convex Case
2025cites this paper
Contextual Linear Bandits with Delay as Payoff
2025cites this paper
Online Learning in the Random Order Model
2025cites this paper
Similarity = Value? Consultation Value Assessment and Alignment for Personalized Search
2025cites this paper
Budgeted-Bandits with Controlled Restarts with Applications in Learning and Computing
2025cites this paper
Sublinear Dynamic Regrets for Aggregative Games With Feedback Delays and Communication Delays
2025cites this paper
Exploiting Curvature in Online Convex Optimization with Delayed Feedback
2025influential citation
Smoothed Online Convex Optimization with Delayed Feedback
2025cites this paper
Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays
2024cites this paper
Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching
2024cites this paper
Learning treatment effects while treating those in need
2024cites this paper
Generalization bounds for mixing processes via delayed online-to-PAC conversions
2024cites this paper
Delay as Payoff in MAB
2024cites this paper
Adversarial Online Learning with Temporal Feedback Graphs
2024cites this paper
Budgeted Recommendation with Delayed Feedback
2024cites this paper
Communication-Efficient Regret-Optimal Distributed Online Convex Optimization
2024influential citation
Non-stochastic Bandits With Evolving Observations
2024cites this paper
Biased Dueling Bandits with Stochastic Delayed Feedback
2024cites this paper
Budget-Constrained and Deadline-Driven Multi-Armed Bandits with Delays
2024cites this paper
Backlogged Bandits: Cost-Effective Learning for Utility Maximization in Queueing Networks
2024influential citation
Ensure Timeliness and Accuracy: A Novel Sliding Window Data Stream Paradigm for Live Streaming Recommendation
2024cites this paper
Learning with Asynchronous Labels
2024influential citation
Online Sequential Decision-Making with Unknown Delays
2024cites this paper
Improved Regret for Bandit Convex Optimization with Delayed Feedback
2024cites this paper
Predictive Linear Online Tracking for Unknown Targets
2024cites this paper
Energy-Efficient Computation Peer Offloading in Satellite Edge Computing Networks
2024cites this paper
Stochastic Multi-Armed Bandits with Strongly Reward-Dependent Delays
2024influential citation
Handling Delayed Feedback in Distributed Online Optimization : A Projection-Free Approach
2024cites this paper
Misalignment, Learning, and Ranking: Harnessing Users Limited Attention
2024cites this paper
A Short Survey on Importance Weighting for Machine Learning
2024cites this paper
Distributed Online Mirror Descent With Delayed Subgradient and Event-Triggered Communications
2024cites this paper
Constant step-size stochastic approximation with delayed updates
2024cites this paper
Non-stationary Online Convex Optimization with Arbitrary Delays
2024influential citation
Ensuring a Semantically Effective IoT Network Through Blockchain with Delayed Feedback
2024cites this paper
Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of Bandits
2024cites this paper
Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning
2024cites this paper
Online-to-PAC generalization bounds under graph-mixing dependencies
2024cites this paper
Delayed feedback in online non-convex optimization: A non-stationary approach with applications
2024cites this paper
Counterfactual contextual bandit for recommendation under delayed feedback
2024cites this paper
Online Residual Learning from Offline Experts for Pedestrian Tracking
2024cites this paper
Online Composite Optimization Between Stochastic and Adversarial Environments
2024cites this paper
Delayed MDPs with Feature Mapping
2024cites this paper
Risk-averse learning with delayed feedback
2024cites this paper
Online Bandit Convex Optimization with Stochastic Constraints and Delays
2024cites this paper
QuACK: A Multipurpose Queuing Algorithm for Cooperative k-Armed Bandits
2024cites this paper
Online Matching: A Real-time Bandit System for Large-scale Recommendations
2023cites this paper
Beta Upper Confidence Bound Policy for the Design of Clinical Trials
2023cites this paper
A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays
2023cites this paper
Energy-Aware Spreading Factor Selection in LoRaWAN Using Delayed-Feedback Bandits
2023cites this paper
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches
2023cites this paper
Online Conversion Rate Prediction via Neural Satellite Networks in Delayed Feedback Advertising
2023cites this paper
Non-Stationary Delayed Combinatorial Semi-Bandit With Causally Related Rewards
2023cites this paper
A New Framework: Short-Term and Long-Term Returns in Stochastic Multi-Armed Bandit
2023cites this paper
Delayed Bandits: When Do Intermediate Observations Help?
2023cites this paper
Contextual Bandits with Budgeted Information Reveal
2023cites this paper
Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay
2023cites this paper
Efficient Reinforcement Learning With Impaired Observability: Learning to Act With Delayed and Missing State Observations
2023cites this paper
User-Oriented Edge Node Grouping in Mobile Edge Computing
2023cites this paper
Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak-Łojasiewicz condition
2023cites this paper
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
2023cites this paper
Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments
2023cites this paper
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards
2023influential citation
Adaptive Experimentation at Scale: Bayesian Algorithms for Flexible Batches
2023cites this paper
Stochastic Submodular Bandits With Delayed Composite Anonymous Bandit Feedback
2023cites this paper
Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations
2023cites this paper
Delayed Feedback in Kernel Bandits
2023cites this paper
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
2023cites this paper
Is Stochastic Mirror Descent Vulnerable to Adversarial Delay Attacks? A Traffic Assignment Resilience Study
2023cites this paper
Non-stationary Delayed Online Convex Optimization: From Full-information to Bandit Setting
2023influential citation
Overcoming Delayed Feedback via Overlook Decision Making
2023cites this paper
Event-triggered distributed online convex optimization with delayed bandit feedback
2023cites this paper
Adversarial Bandits With Multi-User Delayed Feedback: Theory and Application
2023cites this paper
Label Delay in Online Continual Learning
2023cites this paper
Reward innovation for long-term member satisfaction
2023cites this paper
Remote Control of Bandits Over Queues - Relevance of Information Freshness
2023influential citation
RLQ: Workload Allocation With Reinforcement Learning in Distributed Queues
2023cites this paper