Combinatorial Semi-Bandit in the Non-Stationary Environment

Published 2020 in Conference on Uncertainty in Artificial Intelligence

ABSTRACT

In this paper, we investigate the non-stationary combinatorial semi-bandit problem, both in the switching case and in the dynamic case. In the general case where (a) the reward function is non-linear, (b) arms may be probabilistically triggered, and (c) only approximate offline oracle exists \cite{wang2017improving}, our algorithm achieves $\tilde{\mathcal{O}}(\sqrt{\mathcal{S} T})$ distribution-dependent regret in the switching case, and $\tilde{\mathcal{O}}(\mathcal{V}^{1/3}T^{2/3})$ in the dynamic case, where $\mathcal S$ is the number of switchings and $\mathcal V$ is the sum of the total ``distribution changes''. The regret bounds in both scenarios are nearly optimal, but our algorithm needs to know the parameter $\mathcal S$ or $\mathcal V$ in advance. We further show that by employing another technique, our algorithm no longer needs to know the parameters $\mathcal S$ or $\mathcal V$ but the regret bounds could become suboptimal. In a special case where the reward function is linear and we have an exact oracle, we design a parameter-free algorithm that achieves nearly optimal regret both in the switching case and in the dynamic case without knowing the parameters in advance.

PUBLICATION RECORD

Publication year
2020
Venue
Conference on Uncertainty in Artificial Intelligence
Publication date
2020-02-10
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2002.03580
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Bandit Algorithms
2020cited by this paper
A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits
2019cited by this paper
Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
2019cited by this paper
Near-optimal Oracle-efficient Algorithms for Stationary and Non-Stationary Stochastic Linear Bandits
2019cited by this paper
Online Second Price Auction with Semi-bandit Feedback Under the Non-Stationary Setting
2019cited by this paper
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
2019cited by this paper
A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free
2019influential reference
Weighted Linear Bandits for Non-Stationary Environments
2019cited by this paper
Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits
2019influential reference
Learning to Optimize under Non-Stationarity
2018cited by this paper
A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem
2017cited by this paper
Tracking the Best Expert in Non-stationary Stochastic Environments
2017cited by this paper
Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications
2017influential reference
Efficient Contextual Bandits in Non-stationary Worlds
2017cited by this paper
Combinatorial Multi-Armed Bandit with General Reward Functions
2016influential reference
Combinatorial Bandits Revisited
2015cited by this paper
Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
2015cited by this paper
Cascading Bandits: Learning to Rank in the Cascade Model
2015influential reference
Combinatorial Cascading Bandits
2015influential reference
Matroid Bandits: Fast Combinatorial Optimization with Learning
2014cited by this paper
Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
2014influential reference
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
2014cited by this paper
Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms
2014cited by this paper
Combinatorial multi-armed bandit: general framework, results and applications
2013influential reference
Non-Stationary Stochastic Optimization
2013cited by this paper
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
2012cited by this paper
On Upper-Confidence Bound Policies for Switching Bandit Problems
2011influential reference
Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards
2010cited by this paper
The On-Line Shortest Path Problem Under Partial Monitoring
2007cited by this paper
Finite-time Analysis of the Multiarmed Bandit Problem
2002influential reference
The Nonstochastic Multiarmed Bandit Problem
2002cited by this paper
Mathematics of operations research
1998cited by this paper
A Constructive Proof of the Representation Theorem for Polyhedral Sets Based on Fundamental Definitions
1987cited by this paper
Some aspects of the sequential design of experiments
1952cited by this paper
ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES
1933cited by this paper

CITED BY

Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses
2026cites this paper
Exploring Multiple High-Scoring Subspaces in Generative Flow Networks
2026cites this paper
Multi-Play Combinatorial Semi-Bandit Problem
2025cites this paper
Finite-Time Guarantees for Multi-Agent Combinatorial Bandits with Nonstationary Rewards
2025cites this paper
Bandit-Based Charging with Beamforming for Mobile Wireless-Powered IoT Systems
2025cites this paper
Doing More With Less: Balancing Probing Costs and Task Offloading Efficiency At the Network Edge
2025cites this paper
Constrained Feedback Learning for Non-Stationary Multi-Armed Bandits
2025cites this paper
Combinatorial Rising Bandits
2024influential citation
Matroid Semi-Bandits in Sublinear Time
2024cites this paper
Learning-based Scheduling for Information Gathering with QoS Constraints
2024cites this paper
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits
2024cites this paper
Learning to Schedule in Non-Stationary Wireless Networks With Unknown Statistics
2023influential citation
MNL-Bandit in non-stationary environments
2023cites this paper
Non-Stationary Delayed Combinatorial Semi-Bandit With Causally Related Rewards
2023cites this paper
Exploit or Explore? An Empirical Study of Resource Allocation in Scientific Labs
2023cites this paper
Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards
2023cites this paper
PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale
2023cites this paper
BORA: Bayesian Optimization for Resource Allocation
2022cites this paper
Learning to Control under Time-Varying Environment
2022cites this paper
Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
2022cites this paper
A resource allocation scheme for D2D communications with unknown channel state information
2022cites this paper
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach
2021influential citation
Online Energy-optimal Routing for Electric Vehicles with Combinatorial Multi-arm Semi-Bandit
2020cites this paper
Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits
2019cites this paper
A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits
2019cites this paper