Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning

Published 2026 in Unknown venue

ABSTRACT

We propose ACWI (Adaptive Correlation Weighted Intrinsic), an adaptive intrinsic reward scaling framework designed to dynamically balance intrinsic and extrinsic rewards for improved exploration in sparse reward reinforcement learning. Unlike conventional approaches that rely on manually tuned scalar coefficients, which often result in unstable or suboptimal performance across tasks, ACWI learns a state dependent scaling coefficient online. Specifically, ACWI introduces a lightweight Beta Network that predicts the intrinsic reward weight directly from the agent state through an encoder based architecture. The scaling mechanism is optimized using a correlation based objective that encourages alignment between the weighted intrinsic rewards and discounted future extrinsic returns. This formulation enables task adaptive exploration incentives while preserving computational efficiency and training stability. We evaluate ACWI on a suite of sparse reward environments in MiniGrid. Experimental results demonstrate that ACWI consistently improves sample efficiency and learning stability compared to fixed intrinsic reward baselines, achieving superior performance with minimal computational overhead.

PUBLICATION RECORD

Publication year
2026
Venue
Unknown venue
Publication date
2026-02-27
Fields of study
Computer Science
Identifiers
arXiv 2602.24081
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning
2023cited by this paper
Redeeming Intrinsic Rewards via Constrained Optimization
2022cited by this paper
Self-Tuning Deep Reinforcement Learning
2020cited by this paper
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
2020cited by this paper
Agent57: Outperforming the Atari Human Benchmark
2020cited by this paper
On Bonus Based Exploration Methods In The Arcade Learning Environment
2020cited by this paper
Never Give Up: Learning Directed Exploration Strategies
2020cited by this paper
EMI: Exploration with Mutual Information
2018cited by this paper
Exploration by Random Network Distillation
2018influential reference
An Introduction to Deep Reinforcement Learning
2018cited by this paper
Curiosity-Driven Exploration by Self-Supervised Prediction
2017influential reference
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
2017cited by this paper
Count-Based Exploration with Neural Density Models
2017cited by this paper
Proximal Policy Optimization Algorithms
2017influential reference
Reinforcement Learning with Unsupervised Auxiliary Tasks
2016cited by this paper
Mastering the game of Go with deep neural networks and tree search
2016cited by this paper
VIME: Variational Information Maximizing Exploration
2016cited by this paper
Unifying Count-Based Exploration and Intrinsic Motivation
2016influential reference
Learning to Navigate in Complex Environments
2016cited by this paper
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
2016cited by this paper
High-Dimensional Continuous Control Using Generalized Advantage Estimation
2015cited by this paper
On the difficulty of training recurrent neural networks
2012cited by this paper
Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010)
2010cited by this paper
An analysis of model-based Interval Estimation for Markov Decision Processes
2008cited by this paper
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
1999cited by this paper

CITED BY

No citing papers are available for this paper.