Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Yuling Yan,Gen Li,Yuxin Chen,Jianqing Fan

Published 2022 in Operational Research

ABSTRACT

This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Despite a large number of prior works tackling this problem, the state-of-the-art results suffer from the curse of multiple agents in the sense that their sample complexity bounds scale linearly with the total number of joint actions. The current paper proposes a new model-based algorithm, which provably finds an approximate Nash equilibrium with a sample complexity that scales linearly with the total number of individual actions. This work also develops a matching minimax lower bound, demonstrating the minimax optimality of the proposed algorithm for a broad regime of interest. An appealing feature of the result lies in algorithmic simplicity, which reveals the unnecessity of sophisticated variance reduction and sample splitting in achieving sample optimality.

PUBLICATION RECORD

Publication year
2022
Venue
Operational Research
Publication date
2022-06-08
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1287/opre.2022.0342 arXiv 2206.04044
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model
2020cited by this paper
Grandmaster level in StarCraft II using multi-agent reinforcement learning
2019cited by this paper
Inference and uncertainty quantification for noisy matrix completion
2019cited by this paper
Superhuman AI for multiplayer poker
2019cited by this paper
Human-level performance in 3D multiplayer games with population-based reinforcement learning
2018cited by this paper
Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval
2018cited by this paper
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
2017cited by this paper
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
2013cited by this paper
On the complexity of approximating a Nash equilibrium
2011cited by this paper
Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor
2010cited by this paper
Performance Bounds in Lp-norm for Approximate Value Iteration
2007cited by this paper
The complexity of computing a Nash equilibrium
2006cited by this paper
Non-Cooperative Games
2003cited by this paper
Adaptive game playing using multiplicative weights
1999cited by this paper
Stochastic Games*
1953cited by this paper

CITED BY

Feature configuration effects in DRL portfolio management: a risk-focused evaluation under market stress
2025cites this paper
Team variance optimization of n-player stochastic games with separately controlled chains
2025cites this paper
Statistical and Algorithmic Foundations of Reinforcement Learning
2025cites this paper
Conservative Policy Gradient and Policy Improvement
2025cites this paper
COPSRO: An Offline Empirical Game Theoretic Method With Conservative Critic
2024cites this paper
Intelligent Decision-Making Algorithm for UAV Swarm Confrontation Jamming: An M2AC-Based Approach
2024cites this paper
Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity
2024cites this paper
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources
2023cites this paper
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
2023cites this paper
Offline Learning in Markov Games with General Function Approximation
2023cites this paper
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning
2023cites this paper
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
2022cites this paper
Offline Estimation of Controlled Markov Chains: Minimax Nonparametric Estimators and Sample Efficiency
2022cites this paper
Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments
2022cites this paper
Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a Generative Model
2022cites this paper