A New Softmax Operator for Reinforcement Learning

Published 2016 in arXiv.org

ABSTRACT

A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one's weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly used softmax operator in this setting, but we show that this operator is prone to misbehavior. In this work, we study an alternative softmax operator that, among other properties, is both a non-expansion (ensuring convergent behavior in learning and planning) and differentiable (making it possible to improve decisions via gradient descent methods). We provide proofs of these properties and present empirical comparisons between various softmax operators.

PUBLICATION RECORD

Publication year
2016
Venue
arXiv.org
Publication date
2016-12-16
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1612.05628
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Theano: A Python framework for fast computation of mathematical expressions
2016cited by this paper
A Practical Guide to Averaging Functions
2015cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Algorithms for multi-armed bandit problems
2014cited by this paper
Apprenticeship Learning About Multiple Intentions
2011cited by this paper
Beyond Equilibrium: Predicting Human Behavior in Normal-Form Games
2010cited by this paper
Relative Entropy Policy Search
2010cited by this paper
A theoretical and empirical analysis of Expected Sarsa
2009influential reference
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
2007cited by this paper
Goal Inference as Inverse Planning
2007cited by this paper
Bayesian Inverse Reinforcement Learning
2007cited by this paper
Elements of Information Theory
2005cited by this paper
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
2004cited by this paper
A Convergent Form of Approximate Policy Iteration
2002cited by this paper
Reinforcement learning with function approximation converges to a region
2001cited by this paper
Algorithms for Inverse Reinforcement Learning
2000cited by this paper
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
2000cited by this paper
Reinforcement Learning with Function Approximation Converges to a Region
2000cited by this paper
Gradient Descent for General Reinforcement Learning
1998cited by this paper
Bayesian Q-Learning
1998cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Algorithms for Sequential Decision Making
1996cited by this paper
A Generalized Reinforcement-Learning Model: Convergence and Applications
1996influential reference
When the Best Move Isn't Optimal: Q-learning with Exploration
1994cited by this paper
On-line Q-learning using connectionist systems
1994cited by this paper
Markov Decision Processes: Discrete Stochastic Dynamic Programming
1994cited by this paper
Experimental evidence on players' models of other players
1994cited by this paper
THE ROLE OF EXPLORATION IN LEARNING CONTROL
1992cited by this paper
Algorithms for minimization without derivatives
1974cited by this paper

CITED BY

Deep Reinforcement Learning
2018cites this paper
Exploring Hierarchy-Aware Inverse Reinforcement Learning
2018cites this paper
Memristive Fully Convolutional Network: An Accurate Hardware Image-Segmentor in Deep Learning
2018cites this paper
A unified view of entropy-regularized Markov decision processes
2017influential citation
Quasi-Random Action Selection In Markov Decision Processes
2017cites this paper
Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming
2017influential citation
Deep Reinforcement Learning: An Overview
2017cites this paper
Bridging the Gap Between Value and Policy Based Reinforcement Learning
2017cites this paper
Smoothed Dual Embedding Control
2017cites this paper
Improving Policy Gradient by Exploring Under-appreciated Rewards
2016cites this paper