On State Variables, Bandit Problems and POMDPs

Published 2020 in arXiv.org

ABSTRACT

State variables are easily the most subtle dimension of sequential decision problems. This is especially true in the context of active learning problems (bandit problems") where decisions affect what we observe and learn. We describe our canonical framework that models {\it any} sequential decision problem, and present our definition of state variables that allows us to claim: Any properly modeled sequential decision problem is Markovian. We then present a novel two-agent perspective of partially observable Markov decision problems (POMDPs) that allows us to then claim: Any model of a real decision problem is (possibly) non-Markovian. We illustrate these perspectives using the context of observing and treating flu in a population, and provide examples of all four classes of policies in this setting. We close with an indication of how to extend this thinking to multiagent problems.

PUBLICATION RECORD

Publication year
2020
Venue
arXiv.org
Publication date
2020-02-14
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2002.06238
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Reinforcement Learning and Stochastic Optimization
2022influential reference
Introduction to Reinforcement Learning
2020cited by this paper
A unified framework for stochastic optimization
2019cited by this paper
From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions
2019influential reference
Reinforcement learning - an introduction, 2nd Edition
2018cited by this paper
From Single Commodity to Multiattribute Models for Locomotive Optimization: A Comparison of Optimal Integer Programming and Approximate Dynamic Programming
2016cited by this paper
Tutorial on Stochastic Optimization in Energy—Part II: An Energy Storage Illustration
2016influential reference
An Introduction to Reinforcement Learning
2013cited by this paper
Probability and Stochastics
2011cited by this paper
An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application
2009cited by this paper
Online Planning Algorithms for POMDPs
2008influential reference
Approximate Dynamic Programming
2007influential reference
Multi‐Armed Bandit Allocation Indices
1990cited by this paper
Multi-armed Bandit Allocation Indices
1989influential reference
Optimal control theory : an introduction
1970influential reference
Dynamic Programming
1957cited by this paper

CITED BY

Reinforcement Learning and Stochastic Optimization
2022cites this paper
Probabilistic design of optimal sequential decision-making algorithms in learning and control
2022cites this paper
Smart Online Charging Algorithm for Electric Vehicles via Customized Actor–Critic Learning
2021cites this paper
Robust Charging Schedule for Autonomous Electric Vehicles With Uncertain Covariates
2021cites this paper
A Gentle Lecture Note on Filtrations in Reinforcement Learning
2020cites this paper