Learning and acting in unknown and uncertain worlds

Published 2011 in Unknown venue

ABSTRACT

This dissertation addresses the problem of learning to act in an unknown and uncertain world. This is a difficult problem. Even if a world model is available, an assumption not made here, it is known to be intractable to learn an optimal policy for controlling behaviour (Littman 1996). Assuming no world model is known leads to two approaches: model-free learning, which attempts to learn to act without a model of the environment, and model learning, which attempts to learn a model of the environment from interactions with the world. Most earlier approaches make a priori assumptions about the complexity of the model or policy required, the upshot of which is that a fixed amount of memory is available to the agent. It is well known that in a noisy environment, the type assumed within, an environment specific amount of memory is required to act optimally. Fixing the capacity of memory before any interactions have occurred is thus a limiting assumption. The theme of this dissertation is that representing multiple policies or environment models of varying size enables us to address this problem. Both model-free learning and model learning are investigated. For the former, I present a policy search method (usable with a wide range of algorithms) that maintains a population of policies of varying size. By sharing information between policies I show that it can learn near optimal policies for a variety of challenging problems, and that performance is significantly improved over using the same amount of computation without information sharing. I investigate two approaches to model learning. The first is a variational Bayesian method for learning POMDPs. I show that it achieves superior results to the Bayes-adaptive algorithm (Ross, Chaib-draa and Pineau 2007) using their experimental setup. However, this experimental setup makes strong assumptions about prior information, and I show that weakening these assumptions leads to poor performance. I then address model learning for a simpler model, a topological map. I develop a novel non-parametric Bayesian map that sets no limit of the model size, and show experimentally that maps can be learned from robot data with weak prior knowledge.

PUBLICATION RECORD

Publication year
2011
Venue
Unknown venue
Publication date
2011-07-01
Fields of study
Computer Science
Identifiers
No identifiers available.
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Probabilistic Inference Using Markov Chain Monte Carlo Methods
2011cited by this paper
Recurrent policy gradients
2010cited by this paper
The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling
2010cited by this paper
Model-based reinforcement learning with nearly tight exploration complexity bounds
2010cited by this paper
Evolving neural networks in compressed weight space
2010cited by this paper
Bayesian Inference in Monte-Carlo Tree Search
2010cited by this paper
Scaling the iHMM: Parallelization versus Hadoop
2010cited by this paper
Factoring the Mapping Problem: Mobile Robot Map-building in the Hybrid Spatial Semantic Hierarchy
2010cited by this paper
Deterministic Initialization of Hidden Markov Models for Human Action Recognition
2009cited by this paper
Bayesian surprise and landmark detection
2009cited by this paper
Closing the learning-planning loop with predictive state representations
2009cited by this paper
Focused Topic Models
2009cited by this paper
Real-time correlative scan matching
2009cited by this paper
Anticipatory Learning Classifier Systems and Factored Reinforcement Learning
2009cited by this paper
Notes on the OpenSURF Library
2009cited by this paper
A Spectral Algorithm for Learning Hidden Markov Models
2009cited by this paper
Predictive representations for policy gradient in POMDPs
2009cited by this paper
Highly scalable appearance-only SLAM - FAB-MAP 2.0
2009cited by this paper
Topological SLAM using neighbourhood information of places
2009cited by this paper
Keypoint design and evaluation for place recognition in 2D lidar maps
2009cited by this paper
The Infinite Partially Observable Markov Decision Process
2009cited by this paper
VARIATIONAL BAYESIAN ANALYSIS FOR HIDDEN MARKOV MODELS
2009cited by this paper
Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
2009cited by this paper
Evolving neural networks for fractured domains
2008cited by this paper
Beam sampling for the infinite hidden Markov model
2008influential reference
Tree Exploration for Bayesian RL Exploration
2008cited by this paper
Learning Hidden Markov Models Using Nonnegative Matrix Factorization
2008cited by this paper
Accelerated Neural Evolution through Cooperatively Coevolved Synapses
2008cited by this paper
A New Architecture for Learning Classifier Systems to Solve POMDP Problems
2008cited by this paper
Near-optimal Regret Bounds for Reinforcement Learning
2008cited by this paper
Online Planning Algorithms for POMDPs
2008cited by this paper
FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance
2008cited by this paper
Model-based Bayesian Reinforcement Learning in Partially Observable Domains
2008cited by this paper
Policy-Gradients for PSRs and POMDPs
2007cited by this paper
Incremental Spectral Clustering and Its Application To Topological Mapping
2007cited by this paper
A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
2007cited by this paper
Bayes-Adaptive POMDPs
2007influential reference
Hierarchical Beta Processes and the Indian Buffet Process
2007cited by this paper
Learning Partially Observable Markov Models from First Passage Times
2007influential reference
Stick-breaking Construction for the Indian Buffet Process
2007cited by this paper
A Rao-Blackwellized particle filter for topological mapping
2006cited by this paper
Planning algorithms
2006cited by this paper
The Expectation Maximization Algorithm A short tutorial
2006cited by this paper
Probabilistic inference for solving (PO) MDPs
2006cited by this paper
Hierarchical Dirichlet Processes
2006cited by this paper
Consistent observation grouping for generating metric-topological maps that improves robot localization
2006cited by this paper
Generalizing Dijkstra's Algorithm and Gaussian Elimination for Solving MDPs
2005cited by this paper
Infinite latent feature models and the Indian buffet process
2005cited by this paper
Data driven MCMC for Appearance-based Topological Mapping
2005cited by this paper
Universal Artificial Intelligence
2004cited by this paper
Predictive State Representations: A New Theory for Modeling Dynamical Systems
2004cited by this paper
Resolving Perceptual Aliasing In The Presence Of Noisy Sensors
2004cited by this paper
Learning first-order Markov models for control
2004cited by this paper
Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences
2004cited by this paper
Distinctive Image Features from Scale-Invariant Keypoints
2004cited by this paper
Utile distinction hidden Markov models
2004cited by this paper
Automatic State Construction using Decision Tree for Reinforcement Learning Agents
2004cited by this paper
An Introduction to MCMC for Machine Learning
2004cited by this paper
Inference in the space of topological maps: an MCMC-based approach
2004cited by this paper
Heuristic Search Value Iteration for POMDPs
2004influential reference
Comparing and evaluating HMM ensemble training algorithms using train and test and condition number criteria
2003cited by this paper
Policy-Gradient Algorithms for Partially Observable Markov Decision Processes
2003influential reference
Covariant policy search
2003cited by this paper
Design for an Optimal Probe
2003cited by this paper
Reinforcement Learning for Humanoid Robotics
2003cited by this paper
Variational algorithms for approximate Bayesian inference
2003cited by this paper
Learning in a State of Confusion: Perceptual Aliasing in Grid World Navigation
2003cited by this paper
Optimal Ordered Problem Solver
2002cited by this paper
Efficient Reinforcement Learning Through Evolving Neural Network Topologies
2002cited by this paper
Reinforcement Learning and Shaping: Encouraging Intended Behaviors
2002cited by this paper
Scalable Internal-State Policy-Gradient Methods for POMDPs
2002cited by this paper
The Infinite Hidden Markov Model
2002cited by this paper
Polynomial Value Iteration Algorithms for Detrerminstic MDPs
2002cited by this paper
Reinforcement Learning with Long Short-Term Memory
2001cited by this paper
Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State
2001cited by this paper
A Natural Policy Gradient
2001cited by this paper
Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots
2000cited by this paper
Influence of initialisation and stop criteria on HMM based recognisers
2000cited by this paper
Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality
2000cited by this paper
Solving POMDPs Using Selected Past Events
2000cited by this paper
Learning Finite-State Controllers for Partially Observable Environments
1999cited by this paper
Model based Bayesian Exploration
1999cited by this paper
Learning Policies with External Memory
1999cited by this paper
Evolutionary Algorithms for Reinforcement Learning
1999cited by this paper
A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
1998cited by this paper
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
1998cited by this paper
Natural Gradient Works Efficiently in Learning
1998cited by this paper
Planning and Acting in Partially Observable Stochastic Domains
1998cited by this paper
A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants
1998cited by this paper
Exact and approximate algorithms for partially observable markov decision processes
1998cited by this paper
Efficient Bayesian Parameter Estimation in Large Discrete Domains
1998cited by this paper
Long Short-Term Memory
1997cited by this paper
Algorithms for Sequential Decision Making
1996influential reference
Reinforcement learning with selective perception and hidden state
1996cited by this paper
Reinforcement Learning: A Survey
1996cited by this paper
The power of amnesia: Learning probabilistic automata with variable memory length
1996cited by this paper
Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State
1995cited by this paper
Markov Decision Processes: Discrete Stochastic Dynamic Programming
1994cited by this paper
Instance-Based State Identification for Reinforcement Learning
1994cited by this paper
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
1994cited by this paper

CITED BY

No citing papers are available for this paper.