Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

Published 2019 in International Conference on Machine Learning

ABSTRACT

We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$. The leading problem-dependent constants of our bounds are not only optimal in some worst-case sense studied previously, but also optimal for two concrete instances of semi-bandit problems. Our algorithm and analysis extend the recent work of (Zimmert & Seldin, 2019) for the special case of multi-armed bandit, but importantly requires a novel hybrid regularizer designed specifically for semi-bandit. Experimental results on synthetic data show that our algorithm indeed performs well uniformly over different environments. We finally provide a preliminary extension of our results to the full bandit feedback.

PUBLICATION RECORD

Publication year
2019
Venue
International Conference on Machine Learning
Publication date
2019-01-25
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1901.08779
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

TopRank: A practical algorithm for online stochastic ranking
2018cited by this paper
An Optimal Algorithm for Stochastic and Adversarial Bandits
2018influential reference
Adaptation to Easy Data in Prediction with Limited Advice
2018cited by this paper
More Adaptive Algorithms for Adversarial Bandits
2018influential reference
Sparsity, variance and curvature in multi-armed bandits
2017cited by this paper
An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits
2017cited by this paper
Minimal Exploration in Structured Stochastic Bandits
2017influential reference
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits
2016cited by this paper
Introduction to Online Convex Optimization
2016cited by this paper
An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
2016cited by this paper
Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning
2016cited by this paper
Combinatorial Bandits Revisited
2015cited by this paper
Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
2015influential reference
Fighting Bandits with a New Kind of Smoothness
2015cited by this paper
First-order regret bounds for combinatorial semi-bandits
2015cited by this paper
Achieving All with No Parameters: AdaNormalHedge
2015cited by this paper
One Practical Algorithm for Both Stochastic and Adversarial Bandits
2014cited by this paper
A second-order bound with excess losses
2014cited by this paper
An Efficient Algorithm for Learning with Semi-bandit Feedback
2013cited by this paper
Thompson Sampling for Complex Online Problems
2013influential reference
Combinatorial Multi-Armed Bandit: General Framework and Applications
2013cited by this paper
A generalized online mirror descent with applications to classification and regression
2013cited by this paper
Online Prediction under Submodular Constraints
2012cited by this paper
Combinatorial Bandits
2012cited by this paper
Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards
2010cited by this paper
Hedging Structured Concepts
2010cited by this paper
Minimax Policies for Adversarial and Stochastic Bandits
2009cited by this paper
Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
2008cited by this paper
The Price of Bandit Information for Online Optimization
2007cited by this paper
Convex Analysis and Optimization
2006cited by this paper
The Nonstochastic Multiarmed Bandit Problem
2002cited by this paper
Mathematics of operations research
1998cited by this paper
A decision-theoretic generalization of on-line learning and an application to boosting
1997cited by this paper
Asymptotically efficient adaptive allocation rules
1985cited by this paper
25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits
year unknowncited by this paper
JMLR: Workshop and Conference Proceedings vol (2013) 1–13 Bounded regret in stochastic multi-armed bandits
year unknowncited by this paper

CITED BY

Transient Resource Provisioning for Connected Autonomous Vehicles-Oriented Edge Slicing: A Learning-Based Two-Timescale Approach
2026cites this paper
Multi-Play Combinatorial Semi-Bandit Problem
2025cites this paper
Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems
2025cites this paper
Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits
2025influential citation
Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits
2025influential citation
Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems
2025cites this paper
Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems
2025influential citation
Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries
2025cites this paper
A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond
2025influential citation
Last Iterate Analyses of FTRL in Stochasitc Bandits
2025cites this paper
Offline Learning for Combinatorial Multi-armed Bandits
2025cites this paper
Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds
2024cites this paper
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
2024cites this paper
Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring
2024cites this paper
Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification
2024cites this paper
Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits
2024cites this paper
Adversarial Combinatorial Bandits with Switching Cost and Arm Selection Constraints
2024influential citation
Learning-based Scheduling for Information Gathering with QoS Constraints
2024cites this paper
Fair Probabilistic Multi-Armed Bandit With Applications to Network Optimization
2024cites this paper
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
2024cites this paper
A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of θ(T2/3) and its Application to Best-of-Both-Worlds
2024cites this paper
Adversarial Combinatorial Bandits With Switching Costs
2024cites this paper
Online Caching With Switching Cost and Operational Long-Term Constraints: An Online Learning Approach
2024influential citation
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond
2023cites this paper
Comparison of Strategies for Honeypot Deployment
2023cites this paper
An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits
2023influential citation
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits
2023cites this paper
Master–Slave Deep Architecture for Top-K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints
2023cites this paper
Non-Stationary Delayed Combinatorial Semi-Bandit With Causally Related Rewards
2023cites this paper
Understanding the Role of Feedback in Online Learning with Switching Costs
2023cites this paper
Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits
2023influential citation
Learning and Collusion in Multi-unit Auctions
2023cites this paper
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
2023cites this paper
Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm
2023influential citation
Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems
2023influential citation
Clustering of conversational bandits with posterior sampling for user preference learning and elicitation
2023cites this paper
Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds
2023cites this paper
Best of Both Worlds Policy Optimization
2023cites this paper
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning
2023cites this paper
Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds
2022cites this paper
Towards an Optimization Perspective for Bandits Problem
2022cites this paper
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
2022influential citation
Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs
2022cites this paper
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model
2022influential citation
Linear Combinatorial Semi-Bandit with Causally Related Rewards
2022cites this paper
Best-of-Both-Worlds Algorithms for Partial Monitoring
2022cites this paper
Combinatorial Bandits with Linear Constraints: Beyond Knapsacks and Fairness
2022cites this paper
Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
2021influential citation
Robust Wireless Scheduling under Arbitrary Channel Dynamics and Feedback Delay (Invited Paper)
2021cites this paper
A Model Selection Approach for Corruption Robust Reinforcement Learning
2021cites this paper
Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs
2021cites this paper
On Optimal Robustness to Adversarial Corruption in Online Decision Problems
2021cites this paper
Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds
2021cites this paper
Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions
2021cites this paper
Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs
2021cites this paper
Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations
2021influential citation
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
2021cites this paper
Stochastic Graphical Bandits with Adversarial Corruptions
2021cites this paper
Learning-Based Decentralized Offloading Decision Making in an Adversarial Environment
2021cites this paper
Stochastic Dueling Bandits with Adversarial Corruption
2021cites this paper
Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits
2021cites this paper
Combinatorial Bandits under Strategic Manipulations
2021cites this paper
An Algorithm for Stochastic and Adversarial Bandits with Switching Costs
2021cites this paper
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously
2021influential citation
Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization
2021cites this paper
Exploiting Easiness and Overcoming Delays in Online Learning
2020cites this paper
Bandit Algorithms
2020cites this paper
Contributions à l'apprentissage statistique : estimation de densité, agrégation d'experts et forêts aléatoires. (Contributions to statistical learning : density estimation, expert aggregation and random forests)
2020cites this paper
No-regret Learning in Price Competitions under Consumer Reference Effects
2020cites this paper
Corralling Stochastic Bandit Algorithms
2020influential citation
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
2020influential citation
Bandits with adversarial scaling
2020cites this paper
Combinatorial Semi-Bandit in the Non-Stationary Environment
2020cites this paper
Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits
2020cites this paper
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays
2019cites this paper
Adaptivity, Variance and Separation for Adversarial Bandits
2019influential citation
Equipping Experts/Bandits with Long-term Memory
2019cites this paper
A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits
2019influential citation
Introduction to Multi-Armed Bandits
2019cites this paper
Better Algorithms for Stochastic Bandits with Adversarial Corruptions
2019cites this paper
Exploration by Optimisation in Partial Monitoring
2019cites this paper
Corruption Robust Exploration in Episodic Reinforcement Learning
2019cites this paper
On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits
2019cites this paper
On the optimality of the Hedge algorithm in the stochastic regime
2018cites this paper
Anytime Hedge achieves optimal regret in the stochastic regime
2018cites this paper
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
2018cites this paper
Best of both worlds
2010influential citation