Pessimistic Iterative Planning with RNNs for Robust POMDPs

Maris F. L. Galesloot,Marnix Suilen,T. D. Simão,Steven Carr,M. Spaan,U. Topcu,Nils Jansen

Published 2024 in European Conference on Artificial Intelligence

ABSTRACT

Robust POMDPs extend classical POMDPs to incorporate model uncertainty using so-called uncertainty sets on the transition and observation functions, effectively defining ranges of probabilities. Policies for robust POMDPs must be (1) memory-based to account for partial observability and (2) robust against model uncertainty to account for the worst-case probability instances from the uncertainty sets. To compute such robust memory-based policies, we propose the pessimistic iterative planning (PIP) framework, which alternates between (1) selecting pessimistic POMDPs via worst-case probability instances from the uncertainty sets, and (2) computing finite-state controllers (FSCs) for these pessimistic POMDPs. Within PIP, we propose the rFSCNet algorithm, which optimizes a recurrent neural network to compute the FSCs. The empirical evaluation shows that rFSCNet can compute better-performing robust policies than several baselines and a state-of-the-art robust POMDP solver.

PUBLICATION RECORD

Publication year
2024
Venue
European Conference on Artificial Intelligence
Publication date
2024-08-16
Fields of study
Computer Science
Identifiers
DOI 10.3233/FAIA251391 arXiv 2408.08770
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

rfPG: Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs
2025cited by this paper
Imitation Learning: A Survey of Learning Methods, Environments and Metrics
2024cited by this paper
Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs
2024influential reference
Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet
2024cited by this paper
Recurrent networks, hidden states and beliefs in partially observable environments
2022cited by this paper
Safe Reinforcement Learning via Shielding under Partial Observability
2022cited by this paper
Learning Finite State Models fromRecurrent Neural Networks
2022cited by this paper
The Stackelberg Equilibrium for One-sided Zero-sum Partially Observable Stochastic Games
2021cited by this paper
Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes
2021influential reference
Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs
2021cited by this paper
Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization
2020cited by this paper
Robust Finite-State Controllers for Uncertain POMDPs
2020influential reference
Enforcing Almost-Sure Reachability in POMDPs
2020cited by this paper
Larq: An Open-Source Library for Training Binarized Neural Networks
2020cited by this paper
The probabilistic model checker Storm
2020cited by this paper
Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity
2019cited by this paper
Learning Finite State Representations of Recurrent Policy Networks
2018influential reference
Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs
2018cited by this paper
Finite-State Controllers of POMDPs using Parameter Synthesis
2018cited by this paper
Model-based reinforcement learning: A survey
2018cited by this paper
Robust Action Selection in Partially Observable Markov Decision Processes with Model Uncertainty
2018cited by this paper
Domain randomization for transferring deep neural networks from simulation to the real world
2017cited by this paper
Partially Observable Markov Decision Processes
2017cited by this paper
TensorFlow: A system for large-scale machine learning
2016cited by this paper
Energy Efficient Execution of POMDP Policies
2015cited by this paper
Robust partially observable Markov decision process
2015influential reference
Qualitative analysis of POMDPs with temporal logic specifications for robotics applications
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Optimal cost almost-sure reachability in POMDPs
2014cited by this paper
Policy iteration for bounded-parameter POMDPs
2013cited by this paper
On the complexity of model checking interval-valued discrete time Markov chains
2013cited by this paper
Robust Markov Decision Processes
2013cited by this paper
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
2013cited by this paper
Automated Verification Techniques for Probabilistic Systems
2011cited by this paper
The divergence of reinforcement learning algorithms with value-iteration and function approximation
2011cited by this paper
Monte-Carlo Planning in Large POMDPs
2010cited by this paper
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
2010cited by this paper
Solving POMDPs: RTDP-Bel vs. Point-based Algorithms
2009cited by this paper
SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces
2008cited by this paper
Bounded-Parameter Partially Observable Markov Decision Processes
2008cited by this paper
Principles of model checking
2008cited by this paper
Game Theory: A Multi-Leveled Approach
2008cited by this paper
k-means++: the advantages of careful seeding
2007cited by this paper
Partially observable Markov decision processes with imprecise parameters
2007cited by this paper
Sampling-Based Motion Planning With Sensing Uncertainty
2007cited by this paper
Solving Deep Memory POMDPs with Recurrent Policy Gradients
2007cited by this paper
Policy Gradient Methods for Robotics
2006cited by this paper
Dynamic programming and optimal control, 3rd Edition
2005cited by this paper
Robust Dynamic Programming
2005influential reference
Robust Control of Markov Decision Processes with Uncertain Transition Matrices
2005influential reference
Heuristic Search Value Iteration for POMDPs
2004cited by this paper
On the undecidability of probabilistic planning and related stochastic optimization problems
2003cited by this paper
Bounded Finite State Controllers
2003cited by this paper
Policy-Gradient Algorithms for Partially Observable Markov Decision Processes
2003cited by this paper
Probabilistic robotics
2002cited by this paper
Markov decision processes with uncertain transition rates: sensitivity and robust control
2002cited by this paper
Interval arithmetic: From principles to implementation
2001influential reference
Value-Function Approximations for Partially Observable Markov Decision Processes
2000cited by this paper
Stochastic Shortest Path Games
1999influential reference
Solving POMDPs by Searching the Space of Finite Policies
1999cited by this paper
Planning and Acting in Partially Observable Stochastic Domains
1998cited by this paper
Long Short-Term Memory
1997cited by this paper
Learning Policies for Partially Observable Environments: Scaling Up
1997cited by this paper
Stochastic and shortest path games: theory and algorithms
1997influential reference
An Improved Policy Iteratioll Algorithm for Partially Observable MDPs
1997cited by this paper
Extraction of rules from discrete-time recurrent neural networks
1996cited by this paper
Learning Finite State Machines With Self-Clustering Recurrent Networks
1993cited by this paper
An Analysis of Stochastic Shortest Path Problems
1991cited by this paper
Backpropagation Through Time: What It Does and How to Do It
1990cited by this paper
WEIGHT
1976cited by this paper
Optimal control of Markov processes with incomplete state information
1965cited by this paper

CITED BY

Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning
2026cites this paper
rfPG: Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs
2025cites this paper
Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability
2025cites this paper
Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet
2024cites this paper