Duelling Bandits with Weak Regret in Adversarial Environments

Published 2018 in arXiv.org

ABSTRACT

Research on the multi-armed bandit problem has studied the trade-off of exploration and exploitation in depth. However, there are numerous applications where the cardinal absolute-valued feedback model (e.g. ratings from one to five) is not suitable. This has motivated the formulation of the duelling bandits problem, where the learner picks a pair of actions and observes a noisy binary feedback, indicating a relative preference between the two. There exist a multitude of different settings and interpretations of the problem for two reasons. First, due to the absence of a total order of actions, there is no natural definition of the best action. Existing work either explicitly assumes the existence of a linear order, or uses a custom definition for the winner. Second, there are multiple reasonable notions of regret to measure the learner's performance. Most prior work has been focussing on the $\textit{strong regret}$, which averages the quality of the two actions picked. This work focusses on the $\textit{weak regret}$, which is based on the quality of the better of the two actions selected. Weak regret is the more appropriate performance measure when the pair's inferior action has no significant detrimental effect on the pair's quality. We study the duelling bandits problem in the adversarial setting. We provide an algorithm which has theoretical guarantees in both the utility-based setting, which implies a total order, and the unrestricted setting. For the latter, we work with the $\textit{Borda winner}$, finding the action maximising the probability of winning against an action sampled uniformly at random. The thesis concludes with experimental results based on both real-world data and synthetic data, showing the algorithm's performance and limitations.

PUBLICATION RECORD

Publication year
2018
Venue
arXiv.org
Publication date
2018-12-10
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1812.04152
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Dueling Bandits with Weak Regret
2017influential reference
Dueling Bandits with Dependent Arms
2016influential reference
Double Thompson Sampling for Dueling Bandits
2016influential reference
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
2016influential reference
Instance-dependent Regret Bounds for Dueling Bandits
2016influential reference
Copeland Dueling Bandits
2015influential reference
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
2015influential reference
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
2015cited by this paper
Sparse Dueling Bandits
2015influential reference
Contextual Dueling Bandits
2015influential reference
One Practical Algorithm for Both Stochastic and Adversarial Bandits
2014influential reference
Reducing Dueling Bandits to Cardinal Bandits
2014influential reference
Generic Exploration and K-armed Voting Bandits
2013influential reference
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
2013influential reference
The K-armed Dueling Bandits Problem
2012influential reference
Beat the Mean Bandit
2011influential reference
An Empirical Evaluation of Thompson Sampling
2011cited by this paper
Interactively optimizing information retrieval systems as a dueling bandits problem
2009influential reference
How does clickthrough data reflect retrieval quality?
2008influential reference
A Short Introduction to Computational Social Choice
2007influential reference
Nantonac collaborative filtering: recommendation based on order responses
2003influential reference
The Nonstochastic Multiarmed Bandit Problem
2002influential reference
Finite-time Analysis of the Multiarmed Bandit Problem
2002influential reference
25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits
year unknowncited by this paper

CITED BY

Non-Stationary Dueling Bandits Under a Weighted Borda Criterion
2024cites this paper