On the Benefits of Leveraging Structural Information in Planning Over the Learned Model

Jiajun Shen,K. Kuwaranancharoen,Raid Ayoub,Pietro Mercati,S. Sundaram

Published 2023 in American Control Conference

ABSTRACT

Model-based Reinforcement Learning (RL) integrates learning and planning and has received increasing attention in recent years. However, learning the model can incur a significant cost (in terms of sample complexity), due to the need to obtain a sufficient number of samples for each state-action pair. In this paper, we investigate the benefits of leveraging structural information about the system in terms of reducing sample complexity. Specifically, we consider the setting where the transition probability matrix is a known function of a number of structural parameters, whose values are initially unknown. We then consider the problem of estimating those parameters based on the interactions with the environment. We characterize the difference between the Q estimates and the optimal Q value as a function of the number of samples. Our analysis shows that there can be a significant saving in sample complexity by leveraging structural information about the model. We illustrate the findings by considering how to control a queuing system with heterogeneous servers.

PUBLICATION RECORD

Publication year
2023
Venue
American Control Conference
Publication date
2023-03-15
Fields of study
Computer Science, Engineering
Identifiers
DOI 10.23919/ACC55779.2023.10155973 arXiv 2303.08856
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
2022cited by this paper
Toward theoretical understandings of robust Markov decision processes: Sample complexity and asymptotics
2022cited by this paper
Model-free Representation Learning and Exploration in Low-rank MDPs
2021cited by this paper
Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
2021influential reference
Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials
2020cited by this paper
Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal
2019cited by this paper
Reinforcement learning
2019cited by this paper
Sample-Optimal Parametric Q-Learning Using Linearly Additive Features
2019cited by this paper
Reinforcement Learning for Optimal Control of Queueing Systems
2019cited by this paper
Multi-step Reinforcement Learning: A Unifying Algorithm
2017cited by this paper
Variance reduced value iteration and faster algorithms for solving Markov decision processes
2017cited by this paper
Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
2017cited by this paper
Mastering the game of Go without human knowledge
2017cited by this paper
Survey of Model-Based Reinforcement Learning: Applications on Robotics
2017cited by this paper
Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations
2013cited by this paper
PAC Bounds for Discounted MDPs
2012cited by this paper
On the Sample Complexity of Reinforcement Learning with a Generative Model
2012cited by this paper
Reinforcement Learning with a Near Optimal Rate of Convergence
2011cited by this paper
Model-based reinforcement learning with nearly tight exploration complexity bounds
2010cited by this paper
Near-optimal Regret Bounds for Reinforcement Learning
2008cited by this paper
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
2006cited by this paper
On the sample complexity of reinforcement learning.
2003cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Dynamic Programming and Markov Processes
1960cited by this paper
Dynamic Programming
1957cited by this paper

CITED BY

No citing papers are available for this paper.