Diffusion Policy through Conditional Proximal Policy Optimization

Published 2026 in Unknown venue

ABSTRACT

Reinforcement learning (RL) has been extensively employed in a wide range of decision-making problems, such as games and robotics. Recently, diffusion policies have shown strong potential in modeling multi-modal behaviors, enabling more diverse and flexible action generation compared to the conventional Gaussian policy. Despite various attempts to combine RL with diffusion, a key challenge is the difficulty of computing action log-likelihood under the diffusion model. This greatly hinders the direct application of diffusion policies in on-policy reinforcement learning. Most existing methods calculate or approximate the log-likelihood through the entire denoising process in the diffusion model, which can be memory- and computationally inefficient. To overcome this challenge, we propose a novel and efficient method to train a diffusion policy in an on-policy setting that requires only evaluating a simple Gaussian probability. This is achieved by aligning the policy iteration with the diffusion process, which is a distinct paradigm compared to previous work. Moreover, our formulation can naturally handle entropy regularization, which is often difficult to incorporate into diffusion policies. Experiments demonstrate that the proposed method produces multimodal policy behaviors and achieves superior performance on a variety of benchmark tasks in both IsaacLab and MuJoCo Playground.

PUBLICATION RECORD

Publication year
2026
Venue
Unknown venue
Publication date
2026-03-05
Fields of study
Computer Science
Identifiers
arXiv 2603.04790
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Diffusion Guidance Is a Controllable Policy Improvement Operator
2025cited by this paper
Maximum Entropy Reinforcement Learning with Diffusion Policy
2025cited by this paper
Flow Matching Policy Gradients
2025influential reference
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization
2024cited by this paper
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
2024cited by this paper
Diffusion Actor-Critic with Entropy Regulator
2024cited by this paper
Diffusion Policy Policy Optimization
2024cited by this paper
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
2024influential reference
Diffusion policy: Visuomotor policy learning via action diffusion
2023cited by this paper
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
2023cited by this paper
Policy Representation via Diffusion Probability Model for Reinforcement Learning
2023cited by this paper
Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments
2023influential reference
Elucidating the Design Space of Diffusion-Based Generative Models
2022cited by this paper
Classifier-Free Diffusion Guidance
2022cited by this paper
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
2022cited by this paper
Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling
2022cited by this paper
Flow Matching for Generative Modeling
2022cited by this paper
Diffusion Models Beat GANs on Image Synthesis
2021cited by this paper
Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning
2021influential reference
Score-Based Generative Modeling through Stochastic Differential Equations
2020influential reference
Denoising Diffusion Probabilistic Models
2020cited by this paper
Cover
2020cited by this paper
Generative Modeling by Estimating Gradients of the Data Distribution
2019cited by this paper
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
2019cited by this paper
Neural Ordinary Differential Equations
2018cited by this paper
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
2018cited by this paper
Reinforcement Learning with Deep Energy-Based Policies
2017cited by this paper
Proximal Policy Optimization Algorithms
2017cited by this paper
Numerical Solution of Stochastic Differential Equations
2015cited by this paper
High-Dimensional Continuous Control Using Generalized Advantage Estimation
2015cited by this paper
Trust Region Policy Optimization
2015cited by this paper

CITED BY

No citing papers are available for this paper.