Regret bounds for restless Markov bandits

Published 2012 in Theoretical Computer Science

ABSTRACT

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after T steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.

PUBLICATION RECORD

Publication year
2012
Venue
Theoretical Computer Science
Publication date
2012-09-12
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1016/j.tcs.2014.09.026 arXiv 1209.2693
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

What is Theoretical Computer Science
2014cited by this paper
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning
2013cited by this paper
Markov chains and mixing times
2013cited by this paper
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
2012cited by this paper
Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds
2011cited by this paper
Selecting the State-Representation in Reinforcement Learning
2011cited by this paper
Advances in Neural Information Processing Systems 24
2011cited by this paper
Adaptive learning of uncontrolled restless bandits with logarithmic regret
2011influential reference
Minimax Policies for Adversarial and Stochastic Bandits
2009cited by this paper
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs
2009cited by this paper
Bounding Performance Loss in Approximate MDP Homomorphisms
2008cited by this paper
A survey on spectrum management in cognitive radio networks
2008cited by this paper
On the Possibility of Learning in Reactive Environments with Arbitrary Dependence
2008cited by this paper
Near-optimal Regret Bounds for Reinforcement Learning
2008influential reference
Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
2007cited by this paper
Equivalence notions and model minimization in Markov decision processes
2003influential reference
Finite-time Analysis of the Multiarmed Bandit Problem
2002cited by this paper
Model Minimization in Hierarchical Reinforcement Learning
2002cited by this paper
The Nonstochastic Multiarmed Bandit Problem
2002influential reference
Markov Decision Processes: Discrete Stochastic Dynamic Programming
1994cited by this paper
Threshold limits for cover times
1991cited by this paper
Lower bounds for covering times for reversible Markov chains and random walks on graphs
1989cited by this paper
Restless bandits: activity allocation in a changing world
1988cited by this paper
Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards
1987cited by this paper
On Chebyshev-Type Inequalities for Primes
1982cited by this paper
Bandit processes and dynamic allocation indices
1979cited by this paper
Asymptotically Efficient Adaptive Allocation Rules
year unknowninfluential reference

CITED BY

Online Social Welfare Function-based Resource Allocation
2026cites this paper
Adaptive Scheduling: A Reinforcement Learning Whittle Index Approach for Wireless Sensor Networks
2026cites this paper
A Control Theory inspired Exploration Method for a Linear Bandit driven by a Linear Gaussian Dynamical System
2025cites this paper
Thompson Sampling For Bandits With Cool-Down Periods
2025cites this paper
On Restless Linear Bandits
2025cites this paper
From Restless to Contextual: A Thresholding Bandit Reformulation For Finite-horizon Performance
2025cites this paper
From Restless to Contextual: A Thresholding Bandit Approach to Improve Finite-horizon Performance
2025cites this paper
Tabular and Deep Learning for the Whittle Index
2024cites this paper
Restless Linear Bandits
2024cites this paper
Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks
2024cites this paper
Restless Bandit Problem with Rewards Generated by a Linear Gaussian Dynamical System
2024cites this paper
Faster Q-Learning Algorithms for Restless Bandits
2024cites this paper
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation
2024cites this paper
An Adaptive Method for Contextual Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System
2024cites this paper
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
2024cites this paper
Predictive reinforcement learning in non-stationary environments using weighted mixture policy
2024cites this paper
Fairness of Exposure in Online Restless Multi-armed Bandits
2024cites this paper
Optimal Best Arm Identification With Fixed Confidence in Restless Bandits
2023cites this paper
Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints
2023cites this paper
From Stream to Pool: Pricing Under the Law of Diminishing Marginal Utility
2023cites this paper
Reinforcement Learning for Dynamic Dimensioning of Cloud Caches: A Restless Bandit Approach
2023cites this paper
Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation
2023cites this paper
Online Restless Bandits with Unobserved States
2023cites this paper
Exploit or Explore? An Empirical Study of Resource Allocation in Scientific Labs
2023cites this paper
Scaling Up Q-Learning via Exploiting State–Action Equivalence
2023cites this paper
Markovian Restless Bandits and Index Policies: A Review
2023cites this paper
Multi-armed bandit problem with online clustering as side information
2023cites this paper
Linear Bandits with Memory: from Rotting to Rising
2023cites this paper
Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness
2022influential citation
On Learning Whittle Index Policy for Restless Bandits With Scalable Regret
2022cites this paper
A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits
2022cites this paper
Whittle Index-Based Q-Learning for Wireless Edge Caching With Linear Function Approximation
2022cites this paper
Dynamic Bandits with Temporal Structure
2022cites this paper
Information-Gathering in Latent Bandits
2022cites this paper
Autoregressive Bandits
2022cites this paper
Networked Restless Bandits with Positive Externalities
2022cites this paper
Stochastic Rising Bandits
2022cites this paper
Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits
2022influential citation
Non-Stationary Bandits with Auto-Regressive Temporal Dependency
2022cites this paper
Index-aware reinforcement learning for adaptive video streaming at the wireless edge
2022influential citation
An Analysis of Abstracted Model-Based Reinforcement Learning
2022cites this paper
Reinforcement Learning for Dynamic Dimensioning of Cloud Caches: A Restless Bandit Approach
2022cites this paper
Optimistic Whittle Index Policy: Online Learning for Restless Bandits
2022cites this paper
An Analysis of Model-Based Reinforcement Learning From Abstracted Observations
2022influential citation
Nonstationary Bandit Learning via Predictive Sampling
2022cites this paper
Model-free Reinforcement Learning for Content Caching at the Wireless Edge via Restless Bandits
2022cites this paper
Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits
2021influential citation
A Novel Implementation of Q-Learning for the Whittle Index
2021cites this paper
An online algorithm for the risk-aware restless bandit
2021cites this paper
Deep Reinforcement Learning Techniques in Diversified Domains: A Survey
2021cites this paper
Learning to Detect an Odd Restless Markov Arm
2021cites this paper
Planning to Fairly Allocate: Probabilistic Fairness in the Restless Bandit Setting
2021cites this paper
Learning Algorithms for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?
2021cites this paper
Sublinear regret for learning POMDPs
2021cites this paper
Best Model Identification: A Rested Bandit Formulation
2021cites this paper
Detecting an Odd Restless Markov Arm With a Trembling Hand
2021influential citation
Offline RL With Resource Constrained Online Deployment
2021cites this paper
Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits
2020influential citation
Rebounding Bandits for Modeling Satiation Effects
2020cites this paper
Corrupted Contextual Bandits with Action Order Constraints
2020influential citation
Online Model Selection: a Rested Bandit Formulation
2020cites this paper
Restless Hidden Markov Bandit with Linear Rewards
2020cites this paper
A new bandit setting balancing information from state evolution and corrupted context
2020influential citation
Bandit Algorithms
2020cites this paper
Regime Switching Bandits
2020influential citation
Fair Bandit Learning with Delayed Impact of Actions
2020cites this paper
Detecting an Odd Restless Markov Arm with a Trembling Hand
2020cites this paper
Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
2020cites this paper
Expert Selection in High-Dimensional Markov Decision Processes
2020cites this paper
Bandit Learning with Delayed Impact of Actions
2020cites this paper
The Restless Hidden Markov Bandit With Linear Rewards and Side Information
2019cites this paper
Extending hardware transactional memory capacity via rollback-only transactions and suspend/resume
2019cites this paper
Variational Regret Bounds for Reinforcement Learning
2019cites this paper
Learning Multiple Markov Chains via Adaptive Allocation
2019cites this paper
Bandit Learning with Biased Human Feedback
2019cites this paper
Regret Bounds for Thompson Sampling in Restless Bandit Problems
2019cites this paper
Restless dependent bandits with fading memory
2019cites this paper
Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
2019cites this paper
Sequential decision problems in online education
2019influential citation
Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies
2019cites this paper
A Policy for Optimizing Sub-Band Selection Sequences in Wideband Spectrum Sensing
2019cites this paper
Thompson Sampling in Non-Episodic Restless Bandits
2019influential citation
Restless Hidden Markov Bandits with Linear Rewards
2019cites this paper
Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
2019cites this paper
Fighting Boredom in Recommender Systems with Linear Reinforcement Learning
2018cites this paper
Recharging Bandits
2018cites this paper
Bandit algorithms for real-time data capture on large social medias
2018cites this paper
Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks
2018cites this paper
Incentives in the Dark: Multi-armed Bandits for Evolving Users with Unknown Type
2018cites this paper
Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback
2018cites this paper
Reinforcement Learning Algorithm Selection
2017cites this paper
Discrepancy-Based Algorithms for Non-Stationary Rested Bandits
2017cites this paper
A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
2017influential citation
Approximations of the Restless Bandit Problem
2017cites this paper
A Multi-Armed Bandit Approach for Online Expert Selection in Markov Decision Processes
2017cites this paper
Variational Thompson Sampling for Relational Recurrent Bandits
2017cites this paper
Multi-Armed Bandits with Non-Stationary Rewards
2017cites this paper
Multi-armed Bandits: Competing with Optimal Sequences
2016cites this paper
Optimal data utilization for goal-oriented learning
2016cites this paper
Machine learning methods for spectrum exploration and exploitation
2016cites this paper