Diversity is All You Need: Learning Skills without a Reward Function

Benjamin Eysenbach,Abhishek Gupta,Julian Ibarz,S. Levine

Published 2018 in International Conference on Learning Representations

ABSTRACT

Intelligent creatures can explore their environments and learn useful skills without supervision. In this paper, we propose DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. In these environments, some of the learned skills correspond to solving the task, and each skill that solves the task does so in a distinct manner. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning

PUBLICATION RECORD

Publication year
2018
Venue
International Conference on Learning Representations
Publication date
2018-02-16
Fields of study
Computer Science
Identifiers
arXiv 1802.06070
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Temporal Difference Models: Model-Free Deep RL for Model-Based Control
2018cited by this paper
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
2018influential reference
Learning an Embedding Space for Transferable Robot Skills
2018cited by this paper
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
2017cited by this paper
Meta Learning Shared Hierarchies
2017cited by this paper
EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
2017cited by this paper
Reinforcement Learning with Deep Energy-Based Policies
2017influential reference
Equivalence Between Policy Gradients and Soft Q-Learning
2017cited by this paper
Curiosity-Driven Exploration by Self-Supervised Prediction
2017cited by this paper
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2017cited by this paper
Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play
2017cited by this paper
Bridging the Gap Between Value and Policy Based Reinforcement Learning
2017cited by this paper
DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations
2017cited by this paper
Inverse Reward Design
2017cited by this paper
Deep Reinforcement Learning that Matters
2017cited by this paper
Deep Reinforcement Learning from Human Preferences
2017cited by this paper
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates
2016cited by this paper
Mastering the game of Go with deep neural networks and tree search
2016cited by this paper
VIME: Variational Information Maximizing Exploration
2016influential reference
Target-driven visual navigation in indoor scenes using deep reinforcement learning
2016cited by this paper
Variational Intrinsic Control
2016influential reference
The Option-Critic Architecture
2016influential reference
Stochastic Neural Networks for Hierarchical Reinforcement Learning
2016influential reference
Unifying Count-Based Exploration and Intrinsic Motivation
2016cited by this paper
Learning and Transfer of Modulated Locomotor Controllers
2016cited by this paper
Benchmarking Deep Reinforcement Learning for Continuous Control
2016influential reference
Learning to Navigate in Complex Environments
2016cited by this paper
Quality Diversity: A New Frontier for Evolutionary Computation
2016cited by this paper
High-Dimensional Continuous Control Using Generalized Advantage Estimation
2015cited by this paper
Trust Region Policy Optimization
2015cited by this paper
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning
2015cited by this paper
Active learning of inverse models with intrinsically motivated goal exploration in robots
2013cited by this paper
Playing Atari with Deep Reinforcement Learning
2013cited by this paper
Machine learning - a probabilistic perspective
2012cited by this paper
On the deleterious effects of a priori objectives on evolution and representation
2011cited by this paper
Empowerment for continuous agent—environment systems
2011cited by this paper
Abandoning Objectives: Evolution Through the Search for Novelty Alone
2011cited by this paper
Evolving a diversity of virtual creatures through novelty search and local competition
2011cited by this paper
Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010)
2010cited by this paper
Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity
2009cited by this paper
Maximum Entropy Inverse Reinforcement Learning
2008cited by this paper
Pattern Recognition and Machine Learning
2006cited by this paper
The IM algorithm: a variational approach to Information Maximization
2003cited by this paper
Evolving Neural Networks through Augmenting Topologies
2002cited by this paper
Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions.
2000cited by this paper
Feudal Reinforcement Learning
1992cited by this paper
The Matthew effect in science. The reward and communication systems of science are considered.
1968cited by this paper
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 Intrinsic Motivation Systems for Autonomous Mental Development
year unknowncited by this paper

CITED BY

Unsupervised Hierarchical Skill Discovery
2026cites this paper
$\kappa$-Explorer: A Unified Framework for Active Model Estimation in MDPs
2026cites this paper
Convex Markov Games and Beyond: New Proof of Existence, Characterization and Learning Algorithms for Nash Equilibria
2026cites this paper
APC-RL: Exceeding Data-Driven Behavior Priors with Adaptive Policy Composition
2026cites this paper
Controllable Information Production
2026influential citation
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
2026cites this paper
Offline Discovery of Interpretable Skills from Multi-Task Trajectories
2026cites this paper
Task-Aware Exploration via a Predictive Bisimulation Metric
2026cites this paper
Group-Invariant Unsupervised Skill Discovery: Symmetry-aware Skill Representations for Generalizable Behavior
2026cites this paper
DecisionLLM: Large Language Models for Long Sequence Decision Exploration
2026cites this paper
UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs
2026cites this paper
Can We Really Learn One Representation to Optimize All Rewards?
2026cites this paper
SuS: Strategy-aware Surprise for Intrinsic Exploration
2026cites this paper
Proximal Policy Optimization with Evolutionary Mutations
2026cites this paper
K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents
2026cites this paper
SUSD: Structured Unsupervised Skill Discovery through State Factorization
2026influential citation
Localized Dynamics-Aware Domain Adaption for Off-Dynamics Offline Reinforcement Learning
2026cites this paper
Framework for hierarchical deep reinforcement learning with conceptual embedding.
2026cites this paper
Learning Policy Representations for Steerable Behavior Synthesis
2026cites this paper
Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals
2026influential citation
Why motor learning involves multiple systems: an algorithmic perspective
2026cites this paper
Evolving Programmatic Skill Networks
2026cites this paper
Joint Learning of Hierarchical Neural Options and Abstract World Model
2026cites this paper
Maximum Likelihood Reinforcement Learning
2026cites this paper
Reward-Conditioned Reinforcement Learning
2026cites this paper
Playbook: Scalable Discrete Skill Discovery From Unstructured Datasets for Long-Horizon Decision-Making Problems
2025cites this paper
A Survey of Behavior Foundation Model: Next-Generation Whole-Body Control System of Humanoid Robots
2025cites this paper
Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime
2025cites this paper
Modeling intrinsic motivation as reflective planning
2025cites this paper
Unsupervised Skill Discovery as Exploration for Learning Agile Locomotion
2025cites this paper
Epistemically-guided forward-backward exploration
2025influential citation
A Study of Value-Aware Eigenoptions
2025cites this paper
Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation
2025cites this paper
Studying Exploration in RL: An Optimal Transport Analysis of Occupancy Measure Trajectories
2025cites this paper
Structured Diversity Control: A Dual-Level Framework for Group-Aware Multi-Agent Coordination
2025cites this paper
Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback
2025cites this paper
Diversity-Aware Policy Optimization for Large Language Model Reasoning
2025influential citation
AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization
2025cites this paper
A Motivational Architecture for Open-Ended Learning Challenges in Robots
2025cites this paper
Goal Discovery with Causal Capacity for Efficient Reinforcement Learning
2025cites this paper
ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World
2025cites this paper
Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning
2025cites this paper
D3HRL: A Distributed Hierarchical Reinforcement Learning Approach Based on Causal Discovery and Spurious Correlation Detection
2025cites this paper
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
2025cites this paper
H-GRAIL: A Robotic Motivational Architecture to Tackle Open-Ended Learning Challenges
2025influential citation
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
2025cites this paper
Maximizing Confidence Alone Improves Reasoning
2025cites this paper
When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks?
2025cites this paper
SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
2025cites this paper
AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
2025influential citation
Intention-Conditioned Flow Occupancy Models
2025influential citation
Exploration by Random Reward Perturbation
2025cites this paper
PB2: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning
2025cites this paper
Learning Task-Agnostic Motifs to Capture the Continuous Nature of Animal Behavior
2025cites this paper
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
2025cites this paper
Efficient Skill Discovery via Regret-Aware Optimization
2025influential citation
Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs
2025cites this paper
Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light
2025cites this paper
Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
2025influential citation
Automatic penetration testing model based on reinforcement learning for complex network environments
2025cites this paper
REBot: Reflexive Evasion Robot for Instantaneous Dynamic Obstacle Avoidance
2025cites this paper
Unsupervised Partner Design Enables Robust Ad-hoc Teamwork
2025cites this paper
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
2025cites this paper
Adaptive Weighting by Sinkhorn Distance for Sharing Experiences between Multi-Task Reinforcement Learning in Sparse-reward Environments
2025cites this paper
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
2025cites this paper
Skill Learning Using a Single Demonstration for Hierarchical Robotic Navigation
2025cites this paper
In-Context Policy Adaptation via Cross-Domain Skill Diffusion
2025cites this paper
Self-Referencing Agents for Unsupervised Reinforcement Learning
2025cites this paper
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models
2025cites this paper
Causal action empowerment for efficient reinforcement learning in embodied agents
2025cites this paper
MRSD: Multi-Resolution Skill Discovery for HRL Agents
2025influential citation
Evolutionary Policy Optimization
2025cites this paper
Behaviour Discovery and Attribution for Explainable Reinforcement Learning
2025cites this paper
Open-World Skill Discovery from Unsegmented Demonstrations
2025cites this paper
SAC-alpha: dynamic entropy adjustment for enhanced autonomous exploration in unknown environments
2025cites this paper
Offline Reinforcement Learning with Discrete Diffusion Skills
2025cites this paper
Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners
2025cites this paper
Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning
2025cites this paper
Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery
2025influential citation
Autonomous state-space segmentation for Deep-RL sparse reward scenarios
2025cites this paper
Fast Adaptation with Behavioral Foundation Models
2025cites this paper
Vector Quantized-Elites: Unsupervised and Problem-Agnostic Quality-Diversity Optimization
2025cites this paper
Emergence of Goal-Directed Behaviors via Active Inference with Self-Prior
2025cites this paper
Reinforcement Learning for Fail-Operational Systems with Disentangled Dual-Skill Variables
2025cites this paper
Reinforcement Learning from Multi-level and Episodic Human Feedback
2025cites this paper
Improving Human-AI Coordination through Adversarial Training and Generative Models
2025cites this paper
$n$-LIPO: Framework for Diverse Cooperative Agent Generation Using Policy Compatibility
2025cites this paper
Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story
2025cites this paper
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
2025cites this paper
Optimization methods in fully cooperative scenarios: a review of multiagent reinforcement learning
2025cites this paper
Generalized Behavior Learning from Diverse Demonstrations
2025cites this paper
Explainable Reinforcement Learning Agents Using World Models
2025cites this paper
InnateCoder: Learning Programmatic Options with Foundation Models
2025cites this paper
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
2025cites this paper
Causally Aligned Curriculum Learning
2025cites this paper
Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning
2025influential citation
Training RL Agents for Multi-Objective Network Defense Tasks
2025cites this paper
Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals
2025cites this paper
Trajectory First: A Curriculum for Discovering Diverse Policies
2025cites this paper
Efficient Reinforcement Learning by Guiding Generalist World Models with Non-Curated Data
2025cites this paper