Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

Published 2017 in arXiv.org

ABSTRACT

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$. We show that any randomized algorithm needs a running time at least $\Omega(|\mathcal{S}|^2|\mathcal{A}|)$ to compute an $\epsilon$-optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including arrays of cumulative probabilities and binary trees of transition probabilities. For these cases, we show that the complexity lower bound reduces to $\Omega\left( \frac{|\mathcal{S}| |\mathcal{A}|}{\epsilon} \right)$. These results reveal a surprising observation that the computational complexity of the MDP depends on the data structure of input.

PUBLICATION RECORD

Publication year
2017
Venue
arXiv.org
Publication date
2017-05-20
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1705.07312
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time
2017cited by this paper
Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity Learning
2016cited by this paper
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
2015cited by this paper
The value iteration algorithm is not strongly polynomial for discounted dynamic programming
2013cited by this paper
Abstract dynamic programming
2013cited by this paper
Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
2013cited by this paper
PAC Bounds for Discounted MDPs
2012cited by this paper
On the Sample Complexity of Reinforcement Learning with a Generative Model
2012cited by this paper
The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
2011cited by this paper
Subexponential lower bounds for randomized pivoting rules for the simplex algorithm
2011cited by this paper
Dynamic Programming and Optimal Control
2010cited by this paper
Lower Bounds for Howard's Algorithm for Finding Minimum Mean-Cost Cycles
2010cited by this paper
Sublinear Optimization for Machine Learning
2010cited by this paper
Reinforcement Learning in Finite MDPs: PAC Analysis
2009cited by this paper
PAC model-free reinforcement learning
2006cited by this paper
A New Complexity Result on Solving the Markov Decision Problem
2005cited by this paper
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
2004cited by this paper
Lower bounds for local search by quantum arguments
2003cited by this paper
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
2002cited by this paper
Quantum lower bounds by quantum arguments
2000cited by this paper
A survey of computational complexity results in systems and control
2000cited by this paper
On the Complexity of Policy Iteration
1999cited by this paper
Dynamic Programming and Optimal Control. Volume 1
1996cited by this paper
Dynamic Programming and Optimal Control, Two Volume Set
1995cited by this paper
Neuro-Dynamic Programming : An Overview
1995cited by this paper
On the Complexity of Solving Markov Decision Problems
1995cited by this paper
Markov Decision Processes: Discrete Stochastic Dynamic Programming
1994cited by this paper
Near-Optimal Time-Space Tradeoff for Element Distinctness
1994cited by this paper
Dynamic Programming
1993cited by this paper
Solving H-horizon, stationary Markov decision problems in time proportional to log(H)
1990cited by this paper
The complexity of dynamic programming
1989cited by this paper
Near-optimal time-space tradeoff for element distinctness
1988cited by this paper
A Time-Space Tradeoff for Element Distinctness
1987cited by this paper
The Complexity of Markov Decision Processes
1987cited by this paper
An Efficient Method for Weighted Sampling Without Replacement
1980cited by this paper
A time-space tradeoff for sorting on non-oblivious machines
1979cited by this paper
Probabilistic computations: Toward a unified measure of complexity
1977cited by this paper
Dynamic Programming and Markov Processes
1960cited by this paper

CITED BY

Quantum Algorithms for Finite-horizon Markov Decision Processes
2025influential citation
On the hardness of RL with Lookahead
2025cites this paper
Adaptive VNF Placement Considering Overall Latency and 5G Wireless Channel Reliability in Industry 4.0: A Reinforcement Learning Based Approach
2024cites this paper
Control of Fab Lifters via Deep Reinforcement Learning: A Semi-MDP Approach
2024cites this paper
Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
2022cites this paper
A Comment on “Using Randomization to Break the Curse of Dimensionality”
2022cites this paper
On Two Sensors Scheduling for Remote State Estimation With a Shared Memory Channel in a Cyber-Physical System Environment
2021cites this paper
On the limits of using randomness to break a dynamic program’s curse of dimensionality
2020cites this paper
Generalization Bounds for Stochastic Saddle Point Problems
2020cites this paper
Beating the curse of dimensionality in optimal stopping
2020cites this paper
An Optimal Dynamic Admission Control Policy and Upper Bound Analysis in Wireless Sensor Networks
2019cites this paper
The Problem of Dynamic Programming on a Quantum Computer
2019cites this paper
J un 2 01 9 Quantum Algorithms for Solving Dynamic Programming Problems
2019cites this paper
Quantum Algorithms for Solving Dynamic Programming Problems
2019cites this paper
Ju n 20 18 Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
2018cites this paper
Variance reduced value iteration and faster algorithms for solving Markov decision processes
2017cites this paper
Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time
2017cites this paper
Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems
2017cites this paper
Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time
2017cites this paper