Thompson sampling with the online bootstrap

Published 2014 in arXiv.org

ABSTRACT

Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstrap Thompson sampling (BTS), a heuristic method for solving bandit problems which modifies Thompson sampling by replacing the posterior distribution used in Thompson sampling by a bootstrap distribution. We first explain BTS and show that the performance of BTS is competitive to Thompson sampling in the well-studied Bernoulli bandit case. Subsequently, we detail why BTS using the online bootstrap is more scalable than regular Thompson sampling, and we show through simulation that BTS is more robust to a misspecified error distribution. BTS is an appealing modification of Thompson sampling, especially when samples from the posterior are otherwise not available or are costly.

PUBLICATION RECORD

Publication year
2014
Venue
arXiv.org
Publication date
2014-10-15
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1410.4009
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Tutorial on Adaptive Design Optimization.
2013cited by this paper
On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization
2012cited by this paper
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
2012cited by this paper
Bayesian inference and the parametric bootstrap.
2012cited by this paper
Hierarchical Exploration for Accelerating Contextual Bandits
2012cited by this paper
An Empirical Evaluation of Thompson Sampling
2011cited by this paper
Online Learning, Stability, and Stochastic Gradient Descent
2011cited by this paper
The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
2011cited by this paper
Stochastic Convex Optimization with Bandit Feedback
2011cited by this paper
Bootstrapping data arrays of arbitrary order
2011influential reference
Optimal experimental design for a class of bandit problems
2010cited by this paper
Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.
2010cited by this paper
Model-Robust Regression and a Bayesian `Sandwich' Estimator
2010cited by this paper
UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
2010cited by this paper
A modern Bayesian look at the multi-armed bandit
2010influential reference
Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine
2010cited by this paper
Optimal experimental design for model discrimination.
2009cited by this paper
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
2009cited by this paper
Bootstrap Methods: Another Look at the Jackknife
2008cited by this paper
Optimal design of clinical trials comparing several treatments with a control
2007cited by this paper
SMC Samplers for Bayesian Optimal Nonlinear Design
2006cited by this paper
Map-Reduce for Machine Learning on Multicore
2006cited by this paper
Mining Data Streams
2005cited by this paper
Online bagging and boosting
2005cited by this paper
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
2004cited by this paper
Online convex optimization in the bandit setting: gradient descent without a gradient
2004cited by this paper
Lossless Online Bayesian Bagging
2004cited by this paper
An experimental design criterion for minimizing meta‐model prediction errors applied to die casting process design
2003cited by this paper
A Gentle Introduction to Optimal Design for Regression Models
2003cited by this paper
The Elements of Statistical Learning
2003cited by this paper
Bootstrapping Regression Models
2002cited by this paper
Pricing on the Internet
2002cited by this paper
Bandit problems and the exploration/exploitation tradeoff
1998cited by this paper
Optimal design in psychological research.
1997cited by this paper
Optimal Adaptive Policies for Sequential Allocation Problems
1996cited by this paper
Optimal Design: A Computer Program to Study the Best Possible Spacing of Design Points for Model Discrimination
1996cited by this paper
Approximate Bayesian-inference With the Weighted Likelihood Bootstrap
1994cited by this paper
On The Bayesian Bootstrap
1992cited by this paper
Adaptive treatment allocation and the multi-armed bandit problem
1987cited by this paper
Bandit Problems: Sequential Allocation of Experiments.
1986cited by this paper
Asymptotically efficient adaptive allocation rules
1985cited by this paper
Thermal dose determination in cancer therapy.
1984cited by this paper
Multi‐Armed Bandits and the Gittins Index
1980cited by this paper
Bandit processes and dynamic allocation indices
1979cited by this paper
ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES
1933cited by this paper
Asymptotically Efficient Adaptive Allocation Rules
year unknowncited by this paper

CITED BY

Leveraging priors on distribution functions for multi-arm bandits
2025cites this paper
Spatio-Temporal Predictive Learning Using Crossover Attention for Communications and Networking Applications
2025cites this paper
Estimating Causal Effects in Networks with Cluster-Based Bandits
2025cites this paper
Dynamic Information Sub-Selection for Decision Support
2024cites this paper
Integrating Hyperparameter Search into Model-Free AutoML with Context-Free Grammars
2024cites this paper
Risk-Aware Antenna Selection for Multiuser Massive MIMO Under Incomplete CSI
2024cites this paper
Multiplier Bootstrap-based Exploration
2023influential citation
Particle Thompson Sampling with Static Particles
2023cites this paper
An Analysis of Ensemble Sampling
2022cites this paper
Hawkes Process Multi-armed Bandits for Search and Rescue
2022cites this paper
Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning
2022influential citation
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits
2022cites this paper
Reinforcement Learning in Modern Biostatistics: Constructing Optimal Adaptive Interventions
2022cites this paper
Residual Bootstrap Exploration for Stochastic Linear Bandit
2022influential citation
Model-Based Reinforcement Learning from PILCO to PETS
2021cites this paper
Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search
2021influential citation
Debiasing Samples from Online Learning Using Bootstrap
2021cites this paper
Decision-Making Under Selective Labels: Optimal Finite-Domain Policies and Beyond
2021cites this paper
Robust Contextual Bandits via Bootstrapping
2021cites this paper
GuideBoot: Guided Bootstrap for Deep Contextual Banditsin Online Advertising
2021cites this paper
Diffusion Approximations for Thompson Sampling
2021cites this paper
CORe: Capitalizing On Rewards in Bandit Exploration
2021cites this paper
Hawkes Process Multi-armed Bandits for Disaster Search and Rescue
2020cites this paper
StreamingBandit: Experimenting with Bandit Policies
2020influential citation
Q-Learning: Theory and Applications
2020cites this paper
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective
2020cites this paper
Targeting for long-term outcomes
2020influential citation
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning
2020cites this paper
Policy Optimization as Online Learning with Mediator Feedback
2020cites this paper
Perturbed-History Exploration in Stochastic Linear Bandits
2019cites this paper
Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models
2019cites this paper
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
2019cites this paper
Personalized real-time anomaly detection and health feedback for older adults
2019cites this paper
Bootstrap Thompson Sampling and Sequential Decision Problems in the Behavioral Sciences
2019influential citation
Bootstrapping Upper Confidence Bound
2019cites this paper
Personalization in biomedical-informatics: Methodological considerations and recommendations
2019cites this paper
Budgeted Policy Learning for Task-Oriented Dialogue Systems
2019cites this paper
On Applications of Bootstrap in Continuous Space Reinforcement Learning
2019cites this paper
Optimal treatment allocations in space and time for on‐line control of an emerging infectious disease
2018cites this paper
Optimal design of experiments to identify latent behavioral types
2018cites this paper
Practical Evaluation and Optimization of Contextual Bandit Algorithms
2018cites this paper
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
2018cites this paper
Adapting multi-armed bandits policies to contextual bandits scenarios
2018cites this paper
contextual: Evaluating Contextual Multi-Armed Bandit Problems in R
2018cites this paper
Fast Model-Selection through Adapting Design of Experiments Maximizing Information Gain
2018cites this paper
A Contextual Bandit Bake-off
2018influential citation
New Insights into Bootstrapping for Bandits
2018cites this paper
Adapting to Concept Drift in Credit Card Transaction Data Streams Using Contextual Bandits and Decision Trees
2018cites this paper
Structured bandits and applications : exploiting problem structure for better decision-making under uncertainty
2018influential citation
A Tutorial on Thompson Sampling
2017cites this paper
StreamingBandit: Developing Adaptive Persuasive Systems
2016cites this paper
Improving Online Marketing Experiments with Drifting Multi-armed Bandits
2015cites this paper
A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit
2015cites this paper
Bootstrapped Thompson Sampling and Deep Exploration
2015cites this paper
Improving Experimental Designs through Adaptive Treatment Allocation
2015cites this paper
Essays in Industrial Organization and Econometrics
2015cites this paper
Lock in Feedback in Sequential Experiment
2015cites this paper
\One Weird Trick" for Advertising Outcomes: An Exploration of the Multi-Armed Bandit for Performance-Driven Marketing
2015influential citation