Thompson Sampling for Contextual Bandits with Linear Payoffs

Published 2012 in International Conference on Machine Learning

ABSTRACT

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state-of-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studied version of the contextual bandits problem. We prove a high probability regret bound of O(d2/e√T1+e) in time T for any 0 < e < 1, where d is the dimension of each context vector and e is a parameter used by the algorithm. Our results provide the first theoretical guarantees for the contextual version of Thompson Sampling, and are close to the lower bound of Ω(d√T) for this problem. This essentially solves a COLT open problem of Chapelle and Li [COLT 2012].

PUBLICATION RECORD

Publication year
2012
Venue
International Conference on Machine Learning
Publication date
2012-09-14
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1209.3352
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Thompson Sampling for Contextual Bandits with Linear Payoffs
2013cited by this paper
Learning to Optimize via Posterior Sampling
2013cited by this paper
Towards Minimax Policies for Online Linear Optimization with Bandit Feedback
2012cited by this paper
Optimistic Bayesian Sampling in Contextual-Bandit Problems
2012cited by this paper
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
2012influential reference
Thompson Sampling: An Optimal Finite Time Analysis
2012influential reference
Open Problem: Regret Bounds for Thompson Sampling
2012influential reference
Further Optimal Regret Bounds for Thompson Sampling
2012cited by this paper
Simulation Studies in Optimistic Bayesian Sampling in Contextual-Bandit Problems
2011cited by this paper
Contextual Bandits with Linear Payoff Functions
2011influential reference
Improved Algorithms for Linear Stochastic Bandits
2011influential reference
An Empirical Evaluation of Thompson Sampling
2011cited by this paper
Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine
2010influential reference
Parametric Bandits: The Generalized Linear Case
2010influential reference
A modern Bayesian look at the multi-armed bandit
2010influential reference
Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton
2010influential reference
Stochastic Linear Optimization under Bandit Feedback
2008influential reference
The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information
2007cited by this paper
The Epoch-Greedy algorithm for contextual multi-armed bandits
2007cited by this paper
Experience-efficient learning in associative bandit problems
2006cited by this paper
Using Confidence Bounds for Exploitation-Exploration Trade-offs
2003cited by this paper
The Nonstochastic Multiarmed Bandit Problem
2002cited by this paper
A Bayesian Framework for Reinforcement Learning
2000cited by this paper
Exploration and inference in learning from reinforcement
1998cited by this paper
Associative Reinforcement Learning: Functions in k-DNF
1994cited by this paper
One-Armed Bandit Problems with Covariates
1991cited by this paper
Asymptotically efficient adaptive allocation rules
1985cited by this paper
A One-Armed Bandit Problem with a Concomitant Variable
1979cited by this paper
Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55)
1965cited by this paper
ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES
1933cited by this paper
Asymptotically Efficient Adaptive Allocation Rules
year unknowncited by this paper
25th Annual Conference on Learning Theory Analysis of Thompson Sampling for the Multi-armed Bandit Problem
year unknowncited by this paper

CITED BY

Balancing Exploration and Exploitation: Insights from ETC, UCB, and Thompson Sampling
2026cites this paper
Invariance-Based Dynamic Regret Minimization
2026cites this paper
Design and Evaluation of Whole-Page Experience Optimization for E-commerce Search
2026cites this paper
Bi-level Hierarchical Neural Contextual Bandits for Online Recommendation
2026cites this paper
Efficient Simple Regret Algorithms for Stochastic Contextual Bandits
2026cites this paper
PACE: A Personalized Adaptive Curriculum Engine for 9-1-1 Call-taker Training
2026cites this paper
Intelligent EMS Dispatch via Adaptive Learning With Counterfactual and Risk-Aware Optimization
2026cites this paper
Sharp analysis of linear ensemble sampling
2026cites this paper
Prior Diffusiveness and Regret in the Linear-Gaussian Bandit
2026cites this paper
Statistical Reinforcement Learning in the Real World: A Survey of Challenges and Future Directions
2026cites this paper
Intelligent Frameworks for Minimizing Job Completion Time in Clustered Federated Learning
2026cites this paper
A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization
2026influential citation
Active Hypothesis Testing for Correlated Combinatorial Anomaly Detection
2026cites this paper
Decision Support under Prediction-Induced Censoring
2026cites this paper
Improved Algorithms for Nash Welfare in Linear Bandits
2026cites this paper
RAMSeS: Robust and Adaptive Model Selection for Time-Series Anomaly Detection Algorithms
2026cites this paper
Quantum-Enhanced Neural Contextual Bandit Algorithms
2026cites this paper
Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional Posteriors
2026cites this paper
GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
2026cites this paper
Decentralized multi-agent multi-armed bandits for smart electric vehicles charging
2026cites this paper
Aggregative Online Task Assignment in Spatial Crowdsourcing: An Auction-Aware Approach
2026cites this paper
SimMerge: Learning to Select Merge Operators from Similarity Signals
2026cites this paper
Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation
2026cites this paper
Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity
2025cites this paper
A Unified Framework for Bandit Online Multiclass Prediction
2025cites this paper
Learning to Instruct: Fine-Tuning a Task-Aware Instruction Optimizer for Black-Box LLMs
2025influential citation
Thompson Sampling for Multi-Objective Linear Contextual Bandit
2025influential citation
PAC-Bayes Meets Online Contextual Optimization
2025cites this paper
BALLAST: Bandit-Assisted Learning for Latency-Aware Stable Timeouts in Raft
2025cites this paper
Align While Search: Belief-Guided Exploratory Inference for World-Grounded Embodied Agents
2025cites this paper
Optimistic Task Inference for Behavior Foundation Models
2025cites this paper
Scalable LinUCB: Low-Rank Design Matrix Updates for Recommenders with Large Action Spaces
2025cites this paper
Infrequent Exploration in Linear Bandits
2025influential citation
Online Policy Learning via a Self-Normalized Maximal Inequality
2025cites this paper
Beyond Softmax: A New Perspective on Gradient Bandits
2025cites this paper
Safely Exploring Novel Actions in Recommender Systems via Deployment-Efficient Policy Learning
2025cites this paper
Exploration via Feature Perturbation in Contextual Bandits
2025influential citation
Empirical Bayesian Multi-Bandit Learning
2025influential citation
Online Bayesian Risk-Averse Reinforcement Learning
2025influential citation
Thompson Sampling-Based Learning and Control for Unknown Dynamic Systems
2025cites this paper
Counterfactual Model Selection in Contextual Bandits
2025cites this paper
Latent Preference Bandits
2025cites this paper
Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions
2025influential citation
Practical Contextual Bandits for Large-Scale Structured Discrete Constrained Optimization Problems
2025cites this paper
Multi-User Contextual Cascading Bandits for Personalized Recommendation
2025cites this paper
Know When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement Learning
2025cites this paper
Time-aware cross-domain point-of-interest recommendation in social networks
2025cites this paper
Navigating Sparsities in High-Dimensional Linear Contextual Bandits
2025cites this paper
Provable Anytime Ensemble Sampling Algorithms in Nonlinear Contextual Bandits
2025influential citation
On the finite-sample and asymptotic error control of a randomization-probability test for response-adaptive clinical trials.
2025cites this paper
Adaptive Grid-Based Thompson Sampling for Efficient Trajectory Discovery
2025cites this paper
The Confusing Instance Principle for Online Linear Quadratic Control
2025cites this paper
Deep Segmentation-Driven Bandit Optimization for High-Velocity Notification Systems
2025cites this paper
No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes
2025cites this paper
Causality in Bandits: A Survey
2025cites this paper
Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits
2025cites this paper
Online Price Competition under Generalized Linear Demands
2025cites this paper
Geometry Meets Incentives: Sample-Efficient Incentivized Exploration with Linear Contexts
2025cites this paper
Optimizing Life Sciences Agents in Real-Time using Reinforcement Learning
2025cites this paper
CluE: Cluster, Sample & Eliminate - Bayesian Block Elimination for Pure-Exploration with Non-binary Rewards and Limited Budget
2025cites this paper
Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving
2025cites this paper
SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing
2025cites this paper
Multi-Armed Bandit with Sparse and Noisy Feedback
2025cites this paper
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget
2025cites this paper
Adaptive Data Augmentation for Thompson Sampling
2025cites this paper
Profit Maximization for a Robotics-as-a-Service Model
2025cites this paper
FedFlex: Privacy-Aware Homomorphic Encryption Federated Learning Incorporating Dual-Factor Thompson Sampling
2025cites this paper
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
2025cites this paper
High-dimensional Nonparametric Contextual Bandit Problem
2025cites this paper
Dynamic Assortment Selection and Pricing with Censored Preference Feedback
2025cites this paper
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
2025cites this paper
Adaptive Elicitation of Latent Information Using Natural Language
2025cites this paper
DRLPG: Reinforced Opponent-Aware Order Pricing for Hub Mobility Services
2025cites this paper
Multiplayer Information Asymmetric Contextual Bandits
2025cites this paper
Neural Contextual Bandits Under Delayed Feedback Constraints
2025cites this paper
Active Human Feedback Collection via Neural Contextual Dueling Bandits
2025cites this paper
Learning-Based Bundling Strategy for Two Products Under Uncertain Consumer's Valuations
2025cites this paper
Efficient Causal Decision Making with One-sided Feedback
2025cites this paper
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine
2025cites this paper
Representative Action Selection for Large Action-Space Meta-Bandits
2025cites this paper
COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents
2025cites this paper
Thompson Sampling in Online RLHF with General Function Approximation
2025cites this paper
Health Consulting Services Recommendation Considering Patients’ Decision-Making Behaviors: A CNN and Multiarmed Bandit Approach
2025cites this paper
Designing digital health interventions with causal inference and multi-armed bandits: a review
2025cites this paper
Langevin Monte Carlo-based Offloading for Edge-Assisted Real-Time Video Analytics
2025cites this paper
Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?
2025cites this paper
CAE: Repurposing the Critic as an Explorer in Deep Reinforcement Learning
2025cites this paper
Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation
2025cites this paper
Spectral Bellman Method: Unifying Representation and Exploration in RL
2025cites this paper
Risk-inclusive Contextual Bandits for Early Phase Clinical Trials
2025cites this paper
Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
2025influential citation
Prompt Optimization with Logged Bandit Data
2025cites this paper
Decentralized Contextual Bandits with Network Adaptivity
2025cites this paper
Efficient simulation budget allocation for contextual ranking and selection with quadratic models
2025cites this paper
POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition
2025cites this paper
Reward Model Routing in Alignment
2025cites this paper
Multi-Agent Multi-Armed Bandits: application to EVs smart charging with grid constraints
2025influential citation
Best-of-Both Worlds for linear contextual bandits with paid observations
2025cites this paper
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs
2025cites this paper
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
2025cites this paper