Policy Gradients with Variance Related Risk Criteria

Published 2012 in International Conference on Machine Learning

ABSTRACT

Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard. In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria. Our starting point is a new formula for the variance of the cost-to-go in episodic tasks. Using this formula we develop policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. We prove the convergence of these algorithms to local minima and demonstrate their applicability in a portfolio planning problem.

PUBLICATION RECORD

Publication year
2012
Venue
International Conference on Machine Learning
Publication date
2012-06-26
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1206.6404
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Mean-Variance Optimization in Markov Decision Processes
2011cited by this paper
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
2010cited by this paper
Neuro-Dynamic Programming
2009cited by this paper
MUTUAL FUND PERFORMANCE*
2007cited by this paper
An analytic solution to discrete Bayesian reinforcement learning
2006cited by this paper
Risk-Sensitive Reinforcement Learning Applied to Control under Constraints
2005cited by this paper
Robust Control of Markov Decision Processes with Uncertain Transition Matrices
2005cited by this paper
Convergent multiple-timescales reinforcement learning algorithms in normal form games
2003cited by this paper
Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost
2002cited by this paper
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
2001cited by this paper
Infinite-Horizon Policy-Gradient Estimation
2001cited by this paper
Simulation-based optimization of Markov reward processes
1998cited by this paper
Stochastic approximation with two time scales
1997influential reference
Risk Sensitive Markov Decision Processes
1997cited by this paper
Dynamic Programming and Optimal Control, Two Volume Set
1995cited by this paper
Percentile performance criteria for limiting average Markov decision processes
1995cited by this paper
Markov Decision Processes: Discrete Stochastic Dynamic Programming
1994cited by this paper
The variance of discounted Markov decision processes
1982influential reference
Dynamic Programming and Optimal Control, Vol. II
1976influential reference
Investment in Science
1962cited by this paper

CITED BY

Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts
2026cites this paper
Risk-Sensitive Exponential Actor Critic
2026cites this paper
Pareto-Aware Dual-Preference Optimization for Task-Oriented Dialogue
2026cites this paper
Drone-Aided Secure Task Offloading Optimization for Internet of Vehicles: Review, Challenges and Method
2026cites this paper
Tackling value estimation bias in successor features by distributional reinforcement learning
2026cites this paper
Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning
2026cites this paper
Boosting CVaR Policy Optimization with Quantile Gradients
2026cites this paper
Robust guaranteed neural learning-based output tracking control for uncertain nonlinear systems: An uncertainty feedback compensation method
2026cites this paper
Cross-Domain Deep Reinforcement Learning for Real-Time Resource Allocation in Transportation Hubs: From Airport Gates to Seaport Berths
2026cites this paper
Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk
2026cites this paper
Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting
2026cites this paper
Recoverability Has a Law: The ERR Measure for Tool-Augmented Agents
2026cites this paper
Language Modeling for the Future of Finance: A Quantitative Survey into Metrics, Tasks, and Data Opportunities
2025cites this paper
Game-Theoretic Constrained Policy Optimization for Safe Reinforcement Learning
2025cites this paper
Quantum Decision Transformers (QDT): Synergistic Entanglement and Interference for Offline Reinforcement Learning
2025cites this paper
Measures of Variability for Risk-averse Policy Gradient
2025influential citation
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
2025cites this paper
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
2025cites this paper
Uncertainty Prioritized Experience Replay
2025cites this paper
Team variance optimization of n-player stochastic games with separately controlled chains
2025cites this paper
Risk-sensitive Actor-Critic with Static Spectral Risk Measures for Online and Offline Reinforcement Learning
2025cites this paper
Parallel Inspection Route Optimization With Priorities for 5G Base Station Networks
2025cites this paper
Perturbation-Controlled Deep Q-Learning With Human-Teaming for Enhancing Adversarial Robustness
2025cites this paper
Sharpe Ratio Optimization in Markov Decision Processes
2025cites this paper
Risk-Sensitive Variational Actor-Critic: A Model-Based Approach
2025cites this paper
Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement Learning
2025cites this paper
On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration
2025cites this paper
Beyond CVaR: Leveraging Static Spectral Risk Measures for Enhanced Decision-Making in Distributional Reinforcement Learning
2025cites this paper
All-time safety and sample-efficient meta update for online safe meta reinforcement learning under Markov task transition
2025cites this paper
A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety
2025cites this paper
Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making
2025cites this paper
RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization
2025cites this paper
A Safe Reinforcement Learning Algorithm for Supervisory Control of Power Plants
2024cites this paper
A comparison of RL-based and PID controllers for 6-DOF swimming robots: hybrid underwater object tracking
2024cites this paper
Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts
2024cites this paper
A Survey of Constraint Formulations in Safe Reinforcement Learning
2024cites this paper
A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization
2024cites this paper
A Review of Safe Reinforcement Learning: Methods, Theories, and Applications
2024cites this paper
Reward Penalties on Augmented States for Solving Richly Constrained RL Effectively
2024cites this paper
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
2024cites this paper
Navigating the Difficulty of Achieving Global Optimality under Variance-Induced Time Inconsistency
2024cites this paper
Federated reinforcement learning for robot motion planning with zero-shot generalization
2024cites this paper
Learning in practice: reinforcement learning-based traffic signal control augmented with actuated control
2024cites this paper
Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative Markov Games
2024cites this paper
Percentile Criterion Optimization in Offline Reinforcement Learning
2024cites this paper
Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients
2024cites this paper
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
2024cites this paper
Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning
2024cites this paper
Moor: Model-based offline policy optimization with a risk dynamics model
2024cites this paper
Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-StakesScenarios
2024cites this paper
Operator Splitting for Convex Constrained Markov Decision Processes
2024cites this paper
A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety
2024cites this paper
CVA Hedging by Risk-Averse Stochastic-Horizon Reinforcement Learning
2023cites this paper
Modeling Risk in Reinforcement Learning: A Literature Mapping
2023cites this paper
On the Feasibility Guarantees of Deep Reinforcement Learning Solutions for Distribution System Operation
2023cites this paper
Risk-averse control of Markov systems with value function learning
2023cites this paper
On the Global Convergence of Risk-Averse Policy Gradient Methods with Dynamic Time-Consistent Risk Measures
2023cites this paper
Safe and Sample-Efficient Reinforcement Learning for Clustered Dynamic Environments
2023cites this paper
One Risk to Rule Them All: Addressing Distributional Shift in Offline Reinforcement Learning via Risk-Aversion
2023cites this paper
Regularized Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity
2023cites this paper
Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning
2023cites this paper
Constrained Reinforcement Learning in Hard Exploration Problems
2023cites this paper
Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity
2023cites this paper
Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties
2023cites this paper
Decision-making under uncertainty: beyond probabilities
2023cites this paper
Entropic Risk Optimization in Discounted MDPs
2023cites this paper
Solving Constrained Reinforcement Learning through Augmented State and Reward Penalties
2023cites this paper
An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient
2023influential citation
Global Algorithms for Mean-Variance Optimization in Markov Decision Processes
2023cites this paper
Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints
2023cites this paper
A human-centered safe robot reinforcement learning framework with interactive behaviors
2023cites this paper
STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence
2022cites this paper
Learning a Shield from Catastrophic Action Effects: Never Repeat the Same Mistake
2022cites this paper
TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence
2022cites this paper
Privacy-Preserving Reinforcement Learning Beyond Expectation
2022cites this paper
A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
2022cites this paper
Distributed Safe Learning and Planning for Multirobot Systems
2022cites this paper
A Distributional Framework for Risk-Sensitive End-to-End Planning in Continuous MDPs
2022cites this paper
Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures
2022cites this paper
SafeLight: A Reinforcement Learning Method toward Collision-free Traffic Signal Control
2022cites this paper
Risk-Sensitive Reinforcement Learning With Exponential Criteria
2022cites this paper
Reinventing Policy Iteration under Time Inconsistency
2022cites this paper
Towards multi‐agent reinforcement learning‐driven over‐the‐counter market simulations
2022cites this paper
Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL
2022cites this paper
Risk-Sensitive Policy with Distributional Reinforcement Learning
2022cites this paper
Embracing Risk in Reinforcement Learning: The Connection between Risk-Sensitive Exponential and Distributionally Robust Criteria
2022cites this paper
Challenging Common Assumptions in Convex Reinforcement Learning
2022cites this paper
A Probabilistic Perspective on Risk-sensitive Reinforcement Learning
2022cites this paper
Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path
2022cites this paper
Learning Generalizable Risk-Sensitive Policies to Coordinate in Decentralized Multi-Agent General-Sum Games
2022cites this paper
A policy gradient approach for optimization of smooth risk measures
2022cites this paper
One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning
2022cites this paper
A Unifying Framework of Off-Policy General Value Function Evaluation
2022cites this paper
A Review of Safe Reinforcement Learning: Methods, Theory and Applications
2022cites this paper
Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design
2022cites this paper
Distributional Reinforcement Learning for Risk-Sensitive Policies
2022cites this paper
A Rule-based Shield: Accumulating Safety Rules from Catastrophic Action Effects
2022cites this paper
Deep reinforcement learning for data-driven adaptive scanning in ptychography
2022cites this paper
Offline Policy Optimization in RL with Variance Regularizaton
2022influential citation
Finite Sample Analysis of Mean-Volatility Actor-Critic for Risk-Averse Reinforcement Learning
2022influential citation