This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Despite a large number of prior works tackling this problem, the state-of-the-art results suffer from the curse of multiple agents in the sense that their sample complexity bounds scale linearly with the total number of joint actions. The current paper proposes a new model-based algorithm, which provably finds an approximate Nash equilibrium with a sample complexity that scales linearly with the total number of individual actions. This work also develops a matching minimax lower bound, demonstrating the minimax optimality of the proposed algorithm for a broad regime of interest. An appealing feature of the result lies in algorithmic simplicity, which reveals the unnecessity of sophisticated variance reduction and sample splitting in achieving sample optimality.
Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
Yuling Yan,Gen Li,Yuxin Chen,Jianqing Fan
Published 2022 in Operational Research
ABSTRACT
PUBLICATION RECORD
- Publication year
2022
- Venue
Operational Research
- Publication date
2022-06-08
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-15 of 15 references · Page 1 of 1
CITED BY
Showing 1-15 of 15 citing papers · Page 1 of 1