We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose an efficient algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. We prove a finite time expected regret upper bound of order O(√Kln(K)T) for this algorithm and a general lower bound of order Ω(√KT). At the end, we provide experimental results using real data from information retrieval applications.
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
Pratik Gajane,Tanguy Urvoy,F. Clérot
Published 2015 in International Conference on Machine Learning
ABSTRACT
PUBLICATION RECORD
- Publication year
2015
- Venue
International Conference on Machine Learning
- Publication date
2015-07-06
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-29 of 29 references · Page 1 of 1
CITED BY
Showing 1-50 of 50 citing papers · Page 1 of 1