Online convex optimization in the bandit setting: gradient descent without a gradient

Published 2004 in ACM-SIAM Symposium on Discrete Algorithms

ABSTRACT

We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c<inf>1</inf>, c<inf>2</inf>,..., and in each period, we choose a feasible point x<inf>t</inf> in S, and learn the cost c<inf>t</inf>(x<inf>t</inf>). If the function c<inf>t</inf> is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of O(√n). That is, after n rounds, the total cost incurred will be O(√n) more than the cost of the best single feasible decision chosen with the benefit of hindsight, min<inf>x</inf> Σ ct(x).We extend this to the "bandit" setting, where, in each period, only the cost c<inf>t</inf>(x<inf>t</inf>) is revealed, and bound the expected regret as O(n3/4).Our approach uses a simple approximation of the gradient that is computed from evaluating c<inf>t</inf> at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it is possible to use gradient descent without seeing anything more than the value of the functions at a single point. The guarantees hold even in the most general case: online against an adaptive adversary.For the online linear optimization problem [15], algorithms with low regrets in the bandit setting have recently been given against oblivious [1] and adaptive adversaries [19]. In contrast to these algorithms, which distinguish between explicit explore and exploit periods, our algorithm can be interpreted as doing a small amount of exploration in each period.

PUBLICATION RECORD

Publication year
2004
Venue
ACM-SIAM Symposium on Discrete Algorithms
Publication date
2004-08-02
Fields of study
Mathematics, Computer Science, Economics
Identifiers
arXiv cs/0408007
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Simulated Annealing in Convex Bodies and an O ( n 4 ) Volume Algorithm
2007cited by this paper
Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C.; 2003) [book review]
2007cited by this paper
Simulated annealing in convex bodies and an O*(n4) volume algorithm
2006influential reference
Solving convex programs by random walks
2004cited by this paper
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
2004cited by this paper
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
2004cited by this paper
Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control
2004cited by this paper
Nearly Tight Bounds for the Continuum-Armed Bandit Problem
2004cited by this paper
Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control:Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control
2004cited by this paper
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
2003cited by this paper
Introduction to stochastic search and optimization - estimation, simulation, and control
2003cited by this paper
Simulated annealing in convex bodies and an O*(n/sup 4/) volume algorithm
2003cited by this paper
Randomized Algorithms for Stochastic Approximation under Arbitrary Disturbances
2002cited by this paper
Path Kernels and Multiplicative Updates
2002cited by this paper
Efficient algorithms for universal portfolios
2000cited by this paper
Log-Sobolev inequalities and sampling from log-concave distributions
1999cited by this paper
On‐Line Portfolio Selection Using Multiplicative Updates
1998cited by this paper
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
1997cited by this paper
Random walks and an O*(n5) volume algorithm for convex bodies
1997cited by this paper
A one-measurement form of simultaneous perturbation stochastic approximation
1997cited by this paper
Random walks and an O * ( n 5 ) volume algorithm for convex bodies
1997cited by this paper
Universal Portfolios
1996cited by this paper
Gambling in a rigged casino: The adversarial multi-armed bandit problem
1995cited by this paper
Isotropic position and inertia ellipsoids and zonoids of the unit ball of a normed n-dimensional space
1989cited by this paper
Optimization by Simulated Annealing
1983cited by this paper
Polynomial algorithms in linear programming
1980cited by this paper
A polynomial algorithm in linear programming
1979cited by this paper

CITED BY

Deterministic Zeroth-Order Mirror Descent via Vector Fields with A Posteriori Certification
2026cites this paper
Non-Local Extremum Seeking Based on the Divergence Theorem
2026cites this paper
A Reduction from Delayed to Immediate Feedback for Online Convex Optimization with Improved Guarantees
2026influential citation
Improved Dimension Dependence for Bandit Convex Optimization with Gradient Variations
2026cites this paper
Zeroth-Order Feedback-Based Optimization for Distributed Energy Management
2026cites this paper
Learning to Price: Interpretable Attribute-Level Models for Dynamic Markets
2026influential citation
Causal Identification in Multi-Task Demand Learning with Confounding
2026cites this paper
Heterogeneous Distributed Zeroth-Order Nonconvex Optimization with Communication Compression
2026cites this paper
Small Gradient Norm Regret for Online Convex Optimization
2026cites this paper
Zeroth-Order Stackelberg Control in Combinatorial Congestion Games
2026cites this paper
Gradient-Free Approaches is a Key to an Efficient Interaction with Markovian Stochasticity
2026cites this paper
Asymmetric Learning in Convex Games
2026cites this paper
Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling
2026cites this paper
Model-Free Output Feedback Stabilization via Policy Gradient Methods
2026cites this paper
Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds
2026influential citation
Distributed Forgetting-factor Regret-based Online Optimization over Undirected Connected Networks
2025cites this paper
Refining Adaptive Zeroth-Order Optimization at Ease
2025cites this paper
NoProp: Training Neural Networks without Full Back-propagation or Full Forward-propagation
2025cites this paper
A Regularized Online Newton Method for Stochastic Convex Bandits with Linear Vanishing Noise
2025cites this paper
Online bandit non-cooperative games with arbitrary delays
2025influential citation
Risk-Averse Learning with Varying Risk Levels
2025cites this paper
Query-Efficient Zeroth-Order Algorithms for Nonconvex Constrained Optimization
2025cites this paper
CADKnitter: Compositional CAD Generation from Text and Geometry Guidance
2025cites this paper
Conflict-Buffering Optimization by Symmetry Teleportation for Deep Long-Tailed Recognition
2025cites this paper
Preference-based optimization from noisy pairwise comparisons
2025cites this paper
ZOQO: Zero-Order Quantized Optimization
2025cites this paper
Revisiting Projection-Free Online Learning with Time-Varying Constraints
2025influential citation
Functional multi-armed bandit and the best function identification problems
2025cites this paper
Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure
2025cites this paper
Distributed Stochastic Zeroth-Order Optimization With Compressed Communication
2025cites this paper
Achieve Performatively Optimal Policy for Performative Reinforcement Learning
2025cites this paper
Online bandit optimization with stochastic inequality constraints
2025cites this paper
NoProp: Training Neural Networks without Back-propagation or Forward-propagation
2025cites this paper
Statistical Privacy-Preserving Online Nash Equilibrium Learning with Two-Point Bandit Feedback
2025cites this paper
On the Inherent Privacy of Zeroth-Order Projected Gradient Descent
2025cites this paper
Zeroth-Order Optimization is Secretly Single-Step Policy Optimization
2025influential citation
A Structured Proximal Stochastic Variance Reduced Zeroth-order Algorithm
2025cites this paper
Zeroth-Order Optimization Finds Flat Minima
2025cites this paper
Non-Stationary Bandit Convex Optimization: An Optimal Algorithm with Two-Point Feedback
2025influential citation
Policy Optimization in the Linear Quadratic Gaussian Problem: A Frequency Domain Perspective
2025cites this paper
Two-point Random Gradient-free Methods for Model-free Feedback Optimization
2025cites this paper
Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with multiplicative noise
2025influential citation
On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization
2025influential citation
UCB-type Algorithm for Budget-Constrained Expert Learning
2025cites this paper
BOND: License to Train with Black-Box Functions
2025cites this paper
Compressed Momentum-based Single-Point Zero-Order Algorithm for Stochastic Distributed Nonconvex Optimization
2025cites this paper
Non-stationary Bandit Convex Optimization: A Comprehensive Study
2025influential citation
Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry
2025cites this paper
Accelerating Single-Point Zeroth-Order Optimization with Regression-Based Gradient Surrogates
2025influential citation
A Structured Tour of Optimization with Finite Differences
2025cites this paper
On the constrained online convex optimization with feedback delay
2025cites this paper
Adversarial bandit optimization for approximately linear functions
2025cites this paper
ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
2025cites this paper
On the Convergence and Complexity of the Stochastic Central Finite-Difference Based Gradient Estimation Methods
2025cites this paper
DT-DOFL: Digital-Twin-Empowered Decentralized Online Federated Learning for User-Centered Smart Healthcare Service Systems
2025cites this paper
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
2025cites this paper
Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning
2025cites this paper
A Parameter-Free and Near-Optimal Zeroth-Order Algorithm for Stochastic Convex Optimization
2025influential citation
Model-Agnostic Meta-Policy Optimization via Zeroth-Order Estimation: A Linear Quadratic Regulator Perspective
2025influential citation
Data-Driven Distributed Optimization via Aggregative Tracking and Deep-Learning
2025cites this paper
Sample-Efficient Optimization over Generative Priors via Coarse Learnability
2025cites this paper
One‐Point Residual Feedback Algorithms for Distributed Online Convex and Non‐Convex Optimization
2025cites this paper
Learning Stabilizing Policies via an Unstable Subspace Representation
2025cites this paper
BGEFL: Enabling Communication-Efficient Federated Learning via Bandit Gradient Estimation in Resource-Constrained Networks
2025cites this paper
Bregman Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Zeroth-order Optimization
2025cites this paper
Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting
2025cites this paper
Online Episodic Convex Reinforcement Learning
2025cites this paper
Communication-Efficient Distributed Online Nonconvex Optimization with Time-Varying Constraints
2025cites this paper
Balancing accuracy and convergence rate: a hybrid optimisation algorithm for parameter identification of unmanned marine vehicles
2025influential citation
VAMO: Efficient Zeroth-Order Variance Reduction for SGD with Faster Convergence
2025cites this paper
Estimating the Effects of Sample Training Orders for Large Language Models without Retraining
2025cites this paper
Privacy Amplification in Differentially Private Zeroth-Order Optimization with Hidden States
2025cites this paper
Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks
2025cites this paper
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
2025cites this paper
Distributed policy evaluation over multi-agent network with communication delays
2025cites this paper
Quantization Enabled Differential Privacy in Bandit Games With Cooperative Players
2025influential citation
An Introduction to Zero-Order Optimization Techniques for Robotics
2025cites this paper
Scaling Inference Time Compute for Diffusion Models
2025cites this paper
DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing
2025cites this paper
A derivative-free regularization algorithm for equality constrained nonlinear least squares problems
2025cites this paper
Inexact zeroth-order nonsmooth and nonconvex stochastic composite optimization and applications
2025influential citation
Revisiting Randomized Smoothing: Nonsmooth Nonconvex Optimization Beyond Global Lipschitz Continuity
2025cites this paper
Variance Reduced Smoothed Functional REINFORCE Policy Gradient Algorithms
2025cites this paper
Bayesian Optimization for Online Bandit Model Partitioning in Split Federated Learning
2025cites this paper
The Multi-Query Paradox in Zeroth-Order Optimization
2025cites this paper
One-Point Sampling for Distributed Bandit Convex Optimization With Time-Varying Constraints
2025influential citation
Zeroth-Order Sharpness-Aware Learning with Exponential Tilting
2025cites this paper
Query-Efficient Zeroth-Order Algorithms for Nonconvex Optimization
2025cites this paper
Finite-Time Analysis of Stochastic Nonconvex Nonsmooth Optimization on the Riemannian Manifolds
2025cites this paper
Learning Local Stackelberg Equilibria from Repeated Interactions with a Learning Agent
2025cites this paper
Unifying Zeroth-Order Optimization and Genetic Algorithms for Reinforcement Learning
2025cites this paper
Self-Concordant Perturbations for Linear Bandits
2025influential citation
Zeroth-order gradient estimators for stochastic problems with decision-dependent distributions
2025influential citation
ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models
2025cites this paper
ComPO: Preference Alignment via Comparison Oracles
2025cites this paper
Efficient Controllable Diffusion via Optimal Classifier Guidance
2025cites this paper
A Zeroth-Order Extra-Gradient Method For Black-Box Constrained Optimization
2025cites this paper
Optimistic Feasible Search for Closed-Loop Fair Threshold Decision-Making
2025cites this paper
ReLIZO: Sample Reusable Linear Interpolation-based Zeroth-order Optimization
2024cites this paper
Convergence to Equilibrium of No-Regret Dynamics in Congestion Games
2024cites this paper