Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

Published 2015 in International Conference on Machine Learning

ABSTRACT

We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-play Thompson sampling (MP-TS) algorithm, an extension of TS to the multiple-play MAB problem, and discuss its regret analysis. We prove that MP-TS for binary rewards has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al. (1987). Therefore, MP-TS is the first computationally efficient algorithm with optimal regret. A set of computer simulations was also conducted, which compared MP-TS with state-of-the-art algorithms. We also propose a modification of MP-TS, which is shown to have better empirical performance.

PUBLICATION RECORD

Publication year
2015
Venue
International Conference on Machine Learning
Publication date
2015-06-02
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1506.00779
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
2015cited by this paper
Stochastic Regret Minimization via Thompson Sampling
2014cited by this paper
Spectral Thompson Sampling
2014cited by this paper
Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors
2013cited by this paper
Thompson Sampling for Complex Bandit Problems
2013influential reference
Combinatorial Multi-Armed Bandit: General Framework and Applications
2013influential reference
Eluder Dimension and the Sample Complexity of Optimistic Exploration
2013cited by this paper
Thompson Sampling for 1-Dimensional Exponential Family Bandits
2013cited by this paper
(More) Efficient Reinforcement Learning via Posterior Sampling
2013cited by this paper
Thompson Sampling for Contextual Bandits with Linear Payoffs
2012cited by this paper
Truthful learning mechanisms for multi-slot sponsored search auctions with externalities
2012cited by this paper
Further Optimal Regret Bounds for Thompson Sampling
2012influential reference
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
2012cited by this paper
Kullback–Leibler upper confidence bounds for optimal sequential allocation
2012influential reference
An Empirical Evaluation of Thompson Sampling
2011cited by this paper
The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
2011cited by this paper
A modern Bayesian look at the multi-armed bandit
2010cited by this paper
Algorithms for Adversarial Bandit Problems with Multiple Plays
2010cited by this paper
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models.
2010cited by this paper
A Cascade Model for Externalities in Sponsored Search
2008cited by this paper
An experimental comparison of click position-bias models
2008cited by this paper
Opportunistic Spectrum Access in Cognitive Radio Networks
2008cited by this paper
Sponsored Search Auctions with Markovian Users
2008influential reference
Finite-time Analysis of the Multiarmed Bandit Problem
2002influential reference
Optimal Adaptive Policies for Sequential Allocation Problems
1996cited by this paper
Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards
1987influential reference
Asymptotically efficient adaptive allocation rules
1985cited by this paper
ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES
1933cited by this paper
25th Annual Conference on Learning Theory Analysis of Thompson Sampling for the Multi-armed Bandit Problem
year unknowninfluential reference
Asymptotically Efficient Adaptive Allocation Rules
year unknowncited by this paper

CITED BY

RCFuzzer: Recommendation-based Collaborative Fuzzer
2025cites this paper
Online Learning with Probing for Sequential User-Centric Selection
2025cites this paper
Multi-player Multi-armed Bandits with Delayed Feedback
2025cites this paper
Decentralized Asynchronous Multi-player Bandits
2025cites this paper
Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing
2025cites this paper
LITE: Efficiently Estimating Gaussian Probability of Maximality
2025cites this paper
CoCoB: Adaptive Collaborative Combinatorial Bandits for Online Recommendation
2025cites this paper
Iterative Exploration-Driven Sparse SDP Clustering via Thompson Sampling
2025influential citation
Thompson Sampling Policy for Dynamic Participating Client Scenario in Federated Learning
2025cites this paper
Correcting for Position Bias in Learning to Rank: A Control Function Approach
2025cites this paper
Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays
2024cites this paper
Multiple-Plays Multi-armed Bandit with Accelerated Thompson Sampling
2024influential citation
Stochastic Bandits for Egalitarian Assignment
2024cites this paper
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox
2024cites this paper
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
2024cites this paper
A Context Augmented Multi-Play Multi-Armed Bandit Algorithm for Fast Channel Allocation in Opportunistic Spectrum Access
2024cites this paper
The Online Shortest Path Problem: Learning Travel Times Using a Multiarmed Bandit Framework
2024cites this paper
Multi-Armed Bandits with Interference
2024cites this paper
Replicability is Asymptotically Free in Multi-armed Bandits
2024cites this paper
Bandits with Preference Feedback: A Stackelberg Game Perspective
2024cites this paper
Interactive preference analysis: A reinforcement learning framework
2024cites this paper
Hybrid Cognition for Target Tracking in Cognitive Radar Networks
2023cites this paper
Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits
2023cites this paper
Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
2023cites this paper
Constant or logarithmic regret in asynchronous multiplayer bandits
2023influential citation
Learning With Guarantee Via Constrained Multi-Armed Bandit: Theory and Network Applications
2023cites this paper
When Combinatorial Thompson Sampling meets Approximation Regret
2023cites this paper
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints
2023cites this paper
Optimal Routing to Parallel Servers With Unknown Utilities—Multi-Armed Bandit With Queues
2023cites this paper
Contextual Bandits for Hyper-Personalization based on User Behavior in Local Domain
2023influential citation
Short-lived High-volume Bandits
2023cites this paper
Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application
2023cites this paper
UniRank: Unimodal Bandit Algorithms for Online Ranking
2022cites this paper
Multi-target Search Using Teams of Robots Based on Thompson Sampling
2022cites this paper
A survey on multi-player bandits
2022cites this paper
Bridging Offline and Online Experimentation: Constraint Active Search for Deployed Performance Optimization
2022cites this paper
Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification
2022cites this paper
Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?
2022cites this paper
Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback
2022cites this paper
Bayes Optimal Algorithm is Suboptimal in Frequentist Best Arm Identification
2022cites this paper
UniRank: Unimodal Bandit Algorithm for Online Ranking
2022cites this paper
Doubly Robust Estimation for Correcting Position Bias in Click Feedback for Unbiased Learning to Rank
2022cites this paper
Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms
2022influential citation
Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank
2022cites this paper
Autonomous Drug Design with Multi-Armed Bandits
2022cites this paper
An Online Learning Approach to Sequential User-Centric Selection Problems
2022cites this paper
Single-pass Streaming Lower Bounds for Multi-armed Bandits Exploration with Instance-sensitive Sample Complexity
2022cites this paper
The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
2021influential citation
Adversarial Online Learning with Variable Plays in the Pursuit-Evasion Game: Theoretical Foundations and Application in Connected and Automated Vehicle Cybersecurity
2021cites this paper
Adapting Bandit Algorithms for Settings with Sequentially Available Arms
2021cites this paper
Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits
2021cites this paper
Approximated Policy Search in Black-Box Optimization
2021cites this paper
Parametric Graph for Unimodal Ranking Bandit
2021cites this paper
Algorithms for Data and Computation Privacy
2021cites this paper
A High Performance, Low Complexity Algorithm for Multi-Player Bandits Without Collision Sensing Information
2021cites this paper
Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?
2021cites this paper
Active Bayesian Assessment of Black-Box Classifiers
2021cites this paper
Bandit Algorithm for both Unknown Best Position and Best Item Display on Web Pages
2021influential citation
Censored Semi-Bandits for Resource Allocation
2021influential citation
Survey of multiarmed bandit algorithms applied to recommendation systems
2021cites this paper
Blind Exploration and Exploitation of Stochastic Experts
2021cites this paper
Stochastic Dueling Bandits with Adversarial Corruption
2021cites this paper
Bayesian collective learning emerges from heuristic social learning.
2021cites this paper
Adaptive Sequence-Based Stimulus Selection in an ERP-Based Brain-Computer Interface by Thompson Sampling in a Multi-Armed Bandit Problem
2021cites this paper
Risk-Aware Algorithms for Combinatorial Semi-Bandits
2021cites this paper
Order recognition by Schubert polynomials generated by optical near-field statistics via nanometre-scale photochromism
2021cites this paper
Decision tree Thompson sampling for mining hidden populations through attributed search
2021cites this paper
Thompson Sampling-Based Antenna Selection With Partial CSI for TDD Massive MIMO Systems
2020cites this paper
StreamingBandit: Experimenting with Bandit Policies
2020cites this paper
On Thompson Sampling for Smoother-than-Lipschitz Bandits
2020cites this paper
Thompson Sampling-Based Channel Selection Through Density Estimation Aided by Stochastic Geometry
2020cites this paper
Selfish Robustness and Equilibria in Multi-Player Bandits
2020cites this paper
Tight Lower Bounds for Combinatorial Multi-Armed Bandits
2020cites this paper
Active Bayesian Assessment for Black-Box Classifiers
2020cites this paper
Collaborative Online Edge Caching With Bayesian Clustering in Wireless Networks
2020cites this paper
Smart vehicular communication via 5G mmWaves
2020cites this paper
Decentralized Multi-player Multi-armed Bandits with No Collision Information
2020cites this paper
Bounded Regret for Finitely Parameterized Multi-Armed Bandits
2020cites this paper
Thompson Sampling-Based Heterogeneous Network Selection Considering Stochastic Geometry Analysis
2020cites this paper
Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms
2020cites this paper
Investigation of Energy Management and Optimization Using Penalty Based Reinforcement Learning Algorithms for Textile Industry
2020cites this paper
Differentially Private and Budget-Limited Bandit Learning over Matroids
2020cites this paper
Learning-Aided Content Placement in Caching-Enabled fog Computing Systems Using Thompson Sampling
2020cites this paper
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
2020cites this paper
Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach
2020influential citation
Variable Selection Via Thompson Sampling
2020influential citation
Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
2020cites this paper
Multi-Interface Channel Allocation in Fog Computing Systems using Thompson Sampling
2020cites this paper
Online Bayesian Learning for Rate Selection in Millimeter Wave Cognitive Radio Networks
2020cites this paper
Carousel Personalization in Music Streaming Apps with Contextual Bandits
2020influential citation
Position-Based Multiple-Play Bandits with Thompson Sampling
2020cites this paper
Bandits manchots avec échantillonnage de Thompson pour des recommandations multiples suivant un modèle fondé sur les positions
2020cites this paper
On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications
2020cites this paper
Efficient Subspace Search in Data Streams
2020influential citation
Learning-Based Reconfigurable Multiple Access Schemes for Virtualized MTC Networks
2020cites this paper
Choice Bandits
2020influential citation
Learning from user interactions with rankings
2020influential citation
Learning-Based Reconfigurable Access Schemes for Virtualized M2M Networks
2020cites this paper
Supplemental
2020cites this paper
Accelerated learning from recommender systems using multi-armed bandit
2019cites this paper