A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Pratik Gajane,Tanguy Urvoy,F. Clérot

Published 2015 in International Conference on Machine Learning

ABSTRACT

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose an efficient algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. We prove a finite time expected regret upper bound of order O(√Kln(K)T) for this algorithm and a general lower bound of order Ω(√KT). At the end, we provide experimental results using real data from information retrieval applications.

PUBLICATION RECORD

  • Publication year

    2015

  • Venue

    International Conference on Machine Learning

  • Publication date

    2015-07-06

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-29 of 29 references · Page 1 of 1

CITED BY

Showing 1-50 of 50 citing papers · Page 1 of 1