Online convex optimization in the bandit setting: gradient descent without a gradient

A. Flaxman,A. Kalai,H. B. McMahan

Published 2004 in ACM-SIAM Symposium on Discrete Algorithms

ABSTRACT

We study a general online convex optimization problem. We have a convex set <i>S</i> and an unknown sequence of cost functions <i>c</i><inf>1</inf>, <i>c</i><inf>2</inf>,..., and in each period, we choose a feasible point <i>x<inf>t</inf></i> in <i>S</i>, and learn the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>). If the function <i>c<inf>t</inf></i> is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of <i>O</i>(√<i>n</i>). That is, after <i>n</i> rounds, the total cost incurred will be <i>O</i>(√<i>n</i>) more than the cost of the best single feasible decision chosen with the benefit of hindsight, min<inf><i>x</i></inf> Σ <i>ct</i>(<i>x</i>).We extend this to the "bandit" setting, where, in each period, only the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>) is revealed, and bound the expected regret as <i>O</i>(<i>n</i><sup>3/4</sup>).Our approach uses a simple approximation of the gradient that is computed from evaluating <i>c<inf>t</inf></i> at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it is possible to use gradient descent without seeing anything more than the value of the functions at a single point. The guarantees hold even in the most general case: online against an adaptive adversary.For the online linear optimization problem [15], algorithms with low regrets in the bandit setting have recently been given against oblivious [1] and adaptive adversaries [19]. In contrast to these algorithms, which distinguish between explicit <i>explore</i> and <i>exploit</i> periods, our algorithm can be interpreted as doing a small amount of exploration in each period.

PUBLICATION RECORD

  • Publication year

    2004

  • Venue

    ACM-SIAM Symposium on Discrete Algorithms

  • Publication date

    2004-08-02

  • Fields of study

    Mathematics, Computer Science, Economics

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-27 of 27 references · Page 1 of 1

CITED BY

Showing 1-100 of 949 citing papers · Page 1 of 10