Anytime Point-Based Approximations for Large POMDPs

Joelle Pineau,Geoffrey J. Gordon,S. Thrun

Published 2006 in Journal of Artificial Intelligence Research

ABSTRACT

The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.

PUBLICATION RECORD

Publication year
2006
Venue
Journal of Artificial Intelligence Research
Publication date
2006-09-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1613/jair.2078 arXiv 1110.0027
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes
2011influential reference
A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
2007cited by this paper
Probabilistic Temporal Reasoning
2005cited by this paper
Perseus: Randomized Point-based Value Iteration for POMDPs
2005influential reference
Distributed Decision-Making and TaskCoordination in Dynamic, Uncertain andReal-Time Multiagent Environments
2005cited by this paper
Stochastic Local Search for POMDP Controllers
2004influential reference
A fast point-based algorithm for POMDPs
2004influential reference
Heuristic Search Value Iteration for POMDPs
2004influential reference
Towards robotic assistants in nursing homes: Challenges and results
2003cited by this paper
Bounded Finite State Controllers
2003influential reference
Locating moving entities in indoor environments with teams of mobile robots
2003cited by this paper
Perspectives on standardization in mobile robot programming: the Carnegie Mellon Navigation (CARMEN) Toolkit
2003cited by this paper
Applying Metric-Trees to Belief-Point POMDPs
2003cited by this paper
An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes
2002cited by this paper
A New Approach to Linear Filtering and Prediction Problems
2002cited by this paper
Locating Moving Entities in Dynamic Indoor Environments with Teams of Mobile Robots
2002cited by this paper
A New Approach to Linear Filtering and Prediction Problems
2002cited by this paper
Value-Directed Compression of POMDPs
2002influential reference
Planning Technology for Intelligent Cognitive Orthotics
2002cited by this paper
Exponential Family PCA for Belief Compression in POMDPs
2002cited by this paper
Robust Monte Carlo localization for mobile robots
2001cited by this paper
An Improved Grid-Based Approximation Algorithm for POMDPs
2001cited by this paper
A fast heuristic algorithm for decision-theoretic planning
2001influential reference
Algorithms for partially observable markov decision processes
2001influential reference
Value-Function Approximations for Partially Observable Markov Decision Processes
2000influential reference
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
1999cited by this paper
Experiences with an Interactive Museum Tour-Guide Robot
1999cited by this paper
Introduction to Reinforcement Learning
1998cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Planning and Acting in Partially Observable Stochastic Domains
1998influential reference
The application of robotics to a mobility aid for the elderly blind
1998cited by this paper
Tractable Inference for Complex Stochastic Processes
1998cited by this paper
Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes
1997influential reference
Learning Policies for Partially Observable Environments: Scaling Up
1997influential reference
A Heuristic Variable Grid Solution Method for POMDPs
1997influential reference
Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes
1997cited by this paper
Planning in Stochastic Domains: Problem Characteristics and Approximation
1996cited by this paper
Algorithms for Sequential Decision Making
1996influential reference
Fast Planning Through Planning Graph Analysis
1995cited by this paper
Dynamic Programming
1993cited by this paper
UCPOP: A Sound, Complete, Partial Order Planner for ADL
1992cited by this paper
A survey of solution techniques for the partially observed Markov decision process
1991cited by this paper
Systematic Nonlinear Planning
1991cited by this paper
Computationally Feasible Bounds for Partially Observed Markov Decision Processes
1991influential reference
A survey of algorithmic methods for partially observed Markov decision processes
1991influential reference
A tutorial on hidden Markov models and selected applications in speech recognition
1989cited by this paper
Planning for Conjunctive Goals
1987cited by this paper
Non-Uniform Random Variate Generation
1986cited by this paper
The optimal control of par-tially observable Markov processes
1971influential reference
STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving
1971cited by this paper
Stochastic Processes and Filtering Theory
1970cited by this paper
Optimal control of Markov processes with incomplete state information
1965cited by this paper
Uncertainty in Artificial Intelligence Proceedings 2000 Value-directed Belief State Approximation for Pomdps
year unknowncited by this paper

CITED BY

Online Decision Making for UAVs via Receding Horizon MDP and POMDPs
2026cites this paper
Adaptive Exploration for Latent-State Bandits
2026cites this paper
Synergistic operation and maintenance enabling lifecycle-aware opportunistic management of offshore wind energy
2026cites this paper
Optimal Planning and Scheduling for Satellite Servicing Under Partial Observability
2026cites this paper
Feature-Based Belief Aggregation for Partially Observable Markov Decision Problems
2025cites this paper
Deep Belief Markov Models for POMDP Inference
2025cites this paper
Near Optimal Approximations and Finite Memory Policies for POMPDs with Continuous Spaces
2025cites this paper
Predictive maintenance in naval vessel propulsion systems for enhanced marine operations using a BiGMM-HMM framework with divergence-based clustering
2025cites this paper
Lagrangian Relaxation for Multi-Action Partially Observable Restless Bandits: Heuristic Policies and Indexability
2025cites this paper
Counterfactual Online Learning for Open-Loop Monte-Carlo Planning
2025cites this paper
Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration
2025cites this paper
PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning
2025cites this paper
Partially observable Markov decision process framework for operating condition optimization using real-time degradation signals
2025cites this paper
Leaf it to renewal: Improved predictive maintenance policies via renewal theory and decision trees
2025cites this paper
Inventory Systems with Record Inaccuracy: Transaction Errors vs. Unobservable Loss
2025cites this paper
Belief Roadmaps with Uncertain Landmark Evanescence
2025cites this paper
Hi-Drive: Hierarchical POMDP Planning for Safe Autonomous Driving in Diverse Urban Environments
2024cites this paper
How to Exhibit More Predictable Behaviors
2024cites this paper
Refined Bounds on Near Optimality Finite Window Policies in POMDPs and Their Reinforcement Learning
2024cites this paper
Un cadre pour la planification consciente d'un observateur sous observabilité partielle
2024cites this paper
Learning to Look: Seeking Information for Decision Making via Policy Factorization
2024cites this paper
An Empirical Study on the Power of Future Prediction in Partially Observable Environments
2024cites this paper
Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation
2024influential citation
Synergetic-informed deep reinforcement learning for sustainable management of transportation networks with large action spaces
2024cites this paper
Towards Intention Recognition for Robotic Assistants Through Online POMDP Planning
2024cites this paper
Action-Consistent Decentralized Belief Space Planning with Inconsistent Beliefs and Limited Data Sharing: Framework and Simplification Algorithms with Formal Guarantees
2024cites this paper
Heuristics for Partially Observable Stochastic Contingent Planning
2024cites this paper
Comment rendre des comportements plus prédictibles
2023cites this paper
Recursively-Constrained Partially Observable Markov Decision Processes
2023cites this paper
Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability
2023cites this paper
An online path planner based on POMDP for UAVs
2023cites this paper
A Novel Point-Based Algorithm for Multi-Agent Control Using the Common Information Approach
2023cites this paper
Dynamic joint sensor selection and maintenance optimization in partially observable deteriorating systems
2023cites this paper
Topological belief space planning for active SLAM with pairwise Gaussian potentials and performance guarantees
2023cites this paper
Planning and Learning in Partially Observable Systems via Filter Stability
2023influential citation
An Information-Collecting Drone Management Problem for Wildfire Mitigation
2023cites this paper
Banded Controllers for Scalable POMDP Decision-Making
2023cites this paper
Four conservation challenges and a synthesis
2023cites this paper
Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark
2023cites this paper
Multi-Agent Cooperative Search in Multi-Object Uncertain Environment
2023cites this paper
Autonomous Navigation Training with Dynamic Obstacles for Deployment in Highly Constrained Environments
2023cites this paper
Belief State Actor-Critic Algorithm from Separation Principle for POMDP
2023cites this paper
Prospective Side Information for Latent MDPs
2023cites this paper
Optimal Fidelity Selection for Human-Supervised Search
2023cites this paper
Efficient POMDP Behavior Planning for Autonomous Driving in Dense Urban Environments using Multi-Step Occupancy Grid Maps
2022cites this paper
Partially Observable Markov Decision Processes in Robotics: A Survey
2022influential citation
Tractable Optimality in Episodic Latent MABs
2022cites this paper
Hybrid offline and online task planning for service robot using object-level semantic map and probabilistic inference
2022cites this paper
Self-Adaptive Driving in Nonstationary Environments through Conjectural Online Lookahead Adaptation
2022cites this paper
Online Planning for Interactive-POMDPs using Nested Monte Carlo Tree Search
2022cites this paper
Scalable Gradient Ascent for Controllers in Constrained POMDPs
2022cites this paper
Learning in Observable POMDPs, without Computationally Intractable Oracles
2022cites this paper
ChronosPerseus: Randomized Point-based Value Iteration with Importance Sampling for POSMDPs
2022cites this paper
Partial observability and management of ecological systems
2022cites this paper
Efficient Algorithms for Planning with Participation Constraints
2022influential citation
Inductive Synthesis of Finite-State Controllers for POMDPs
2022cites this paper
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
2022cites this paper
Linear programming-based solution methods for constrained POMDPs
2022cites this paper
Assisting Unknown Teammates in Unknown Tasks: Ad Hoc Teamwork under Partial Observability
2022cites this paper
Learning and planning in partially observable environments without prior domain knowledge
2022cites this paper
Optimal policies for Bayesian olfactory search in turbulent flows
2022cites this paper
Linear programming-based solution methods for constrained partially observable Markov decision processes
2022cites this paper
Simplified Risk-aware Decision Making with Belief-dependent Rewards in Partially Observable Domains
2022cites this paper
Near Optimality of Finite Memory Policies for POMPDs with Continuous Spaces
2022cites this paper
Awareness and Adaptability in Multi-Agent Reinforcement Learning
2022influential citation
Planning in Observable POMDPs in Quasipolynomial Time
2022cites this paper
Learning Path Constraints for UAV Autonomous Navigation Under Uncertain GNSS Availability
2022cites this paper
GPU-Accelerated Multi-Objective Optimal Planning in Stochastic Dynamic Environments
2022cites this paper
Load Balancing in Compute Clusters With Delayed Feedback
2021cites this paper
Ad Hoc Teamwork under Partial Observability
2021cites this paper
Stochastic optimization for vaccine and testing kit allocation for the COVID-19 pandemic
2021influential citation
Solving Common-Payoff Games with Approximate Policy Iteration
2021cites this paper
Reinforcement Learning in Reward-Mixing MDPs
2021cites this paper
End-to-End Probabilistic Depth Perception and 3D Obstacle Avoidance using POMDP
2021cites this paper
Adaptive Beam Alignment in Mm-Wave Networks: A Deep Variational Autoencoder Architecture
2021cites this paper
Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback
2021cites this paper
Safe path planning for UAV urban operation under GNSS signal occlusion risk
2021cites this paper
Learning and Adaptation in Millimeter-Wave: a Dual Timescale Variational Framework
2021influential citation
Learning and Adaptation for Millimeter-Wave Beam Tracking and Training: A Dual Timescale Variational Framework
2021influential citation
Scalable POMDP Decision-Making Using Circulant Controllers
2021influential citation
Optimal adaptive inspection and maintenance planning for deteriorating structural systems
2021cites this paper
iX-BSP: Incremental Belief Space Planning
2021cites this paper
Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability
2021cites this paper
RL for Latent MDPs: Regret Guarantees and a Lower Bound
2021influential citation
3D real-time path planning of UAVs in dynamic environments
2021cites this paper
Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess
2021cites this paper
Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information
2020influential citation
Fast Decision-making under Time and Resource Constraints
2020cites this paper
Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes
2020cites this paper
Planning with Submodular Objective Functions
2020cites this paper
Unsupervised Learning in Space and Time: A Modern Approach for Computer Vision using Graph-based Techniques and Deep Neural Networks
2020cites this paper
Bayesian inference with incomplete knowledge explains perceptual confidence and its deviations from accuracy
2020cites this paper
Bayesian Optimized Monte Carlo Planning
2020cites this paper
Multi-Objective POMDPs for Robust Autonomy
2020cites this paper
Adaptive Millimeter-Wave Communications Exploiting Mobility and Blockage Dynamics
2020cites this paper
Mobility and Blockage-Aware Communications in Millimeter-Wave Vehicular Networks
2020influential citation
Recursively modeling other agents for decision making: A research perspective
2020cites this paper
Partially Observable Markov Decision Processes
2020cites this paper
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration With Application to Autonomous Sequential Repair Problems
2020cites this paper
Joint Policy Search for Multi-agent Collaboration with Imperfect Information
2020cites this paper