Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation

Shi-Yong Chen,Yang Yu,Qing Da,Jun Tan,Haikuan Huang,Haihong Tang

Published 2018 in Knowledge Discovery and Data Mining

ABSTRACT

Deep reinforcement learning has shown great potential in improving system performance autonomously, by learning from iterations with the environment. However, traditional reinforcement learning approaches are designed to work in static environments. In many real-world problems, the environments are commonly dynamic, in which the performance of reinforcement learning approaches can degrade drastically. A direct cause of the performance degradation is the high-variance and biased estimation of the reward, due to the distribution shifting in dynamic environments. In this paper, we propose two techniques to alleviate the unstable reward estimation problem in dynamic environments, the stratified sampling replay strategy and the approximate regretted reward, which address the problem from the sample aspect and the reward aspect, respectively. Integrating the two techniques with Double DQN, we propose the Robust DQN method. We apply Robust DQN in the tip recommendation system in Taobao online retail trading platform. We firstly disclose the highly dynamic property of the recommendation application. We then carried out online A/B test to examine Robust DQN. The results show that Robust DQN can effectively stabilize the value estimation and, therefore, improves the performance in this real-world dynamic environment.

PUBLICATION RECORD

Publication year
2018
Venue
Knowledge Discovery and Data Mining
Publication date
2018-07-19
Fields of study
Computer Science
Identifiers
DOI 10.1145/3219819.3220122
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application
2018cited by this paper
Reinforcement Learning based Recommender System using Biclustering Technique
2018cited by this paper
A Survey on Recommendation System
2017cited by this paper
Mastering the game of Go without human knowledge
2017cited by this paper
Q-learning with experience replay in a dynamic environment
2016cited by this paper
Deep Exploration via Bootstrapped DQN
2016cited by this paper
Addressing Environment Non-Stationarity by Repeating Q-learning Updates
2016cited by this paper
Mastering the game of Go with deep neural networks and tree search
2016cited by this paper
Human-level control through deep reinforcement learning
2015cited by this paper
Dueling Network Architectures for Deep Reinforcement Learning
2015cited by this paper
Prioritized Experience Replay
2015cited by this paper
Deep Reinforcement Learning with Double Q-Learning
2015cited by this paper
DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation
2014cited by this paper
Reinforcement learning for dynamic environment: a classification of dynamic environments and a detection method of environmental changes
2013cited by this paper
Playing Atari with Deep Reinforcement Learning
2013cited by this paper
Supplementary Materials for: Effect of Separate Sampling on Classification Accuracy
2013cited by this paper
Play it again: reactivation of waking experience and memory.
2010cited by this paper
Double Q-learning
2010cited by this paper
Survey sampling: theory and methods
2008cited by this paper
Finite-time Analysis of the Multiarmed Bandit Problem
2002cited by this paper
Reinforcement Learning in Dynamic Environments using Instantiated Information
2001cited by this paper
Introduction to Reinforcement Learning
1998cited by this paper
Reinforcement Learning: An Introduction
1998cited by this paper
Learning from delayed rewards
1995cited by this paper
Learning from delayed rewards
1989cited by this paper

CITED BY

Adaptive Attention-based State Representation in reinforcement learning based recommendation systems
2026cites this paper
A Long-term Value Prediction Framework In Video Ranking
2026cites this paper
Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation
2026cites this paper
MARCO: A Cooperative Knowledge Transfer Framework for Personalized Cross-domain Recommendations
2025cites this paper
Deep Learning Application in Sales Automation and Customer Experience Personalization in Small and Medium-Sized Business: A Hybrid Approach Using Transformer-Based Large Language Models and Reinforcement Learning
2025cites this paper
Examining the Generalisability of Robust Loss Functions: A Comparative Study of Q-Learning and SARSA Performance
2025cites this paper
Achieving Nearly-Optimal Regret and Sample Complexity in Dueling Bandits with Applications in Online Recommendations
2025cites this paper
Natural Policy Gradient for Average Reward Non-Stationary RL
2025cites this paper
Online Preference Weight Estimation Algorithm with Vanishing Regret for Car-Hailing in Road Network
2024cites this paper
Rethinking Offline Reinforcement Learning for Sequential Recommendation from A Pair-Wise Q-Learning Perspective
2024cites this paper
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation
2024cites this paper
Energy-Efficient Cooperative Secure Communications in mmWave Vehicular Networks Using Deep Recurrent Reinforcement Learning
2024cites this paper
RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems
2024cites this paper
LDQN: A Lightweight Deep Reinforcement Learning Model
2024cites this paper
Progressively Learning to Reach Remote Goals by Continuously Updating Boundary Goals
2024cites this paper
Deep reinforcement learning based on balanced stratified prioritized experience replay for customer credit scoring in peer-to-peer lending
2024influential citation
Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodology
2024cites this paper
Brain-Inspired Learning, Perception, and Cognition: A Comprehensive Review
2024cites this paper
Auditing health-related recommendations in social media: A Case Study of Abortion on YouTube
2024cites this paper
An Exhaustive Analysis of Reinforcement Learning Implementations within Recommendation System Paradigms
2024cites this paper
Report on the Search Futures Workshop at ECIR 2024
2024cites this paper
On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems
2024cites this paper
Two-Stage Constrained Actor-Critic for Short Video Recommendation
2023cites this paper
Cost-Effective Incremental Deep Model: Matching Model Capacity With the Least Sampling
2023cites this paper
Reinforcement Learning in Natural Language Processing: A Survey
2023cites this paper
MAGNET: Multi-Interest Attentive Group Recommender with Deep Reinforcement Learning
2023cites this paper
MASSE: A Multi-Agent Based Air Ticketing Service Selection Approach
2023cites this paper
Episodic Return Decomposition by Difference of Implicitly Assigned Sub-Trajectory Reward
2023cites this paper
Multi-gate Mixture-of-Contrastive-Experts with Graph-based Gating Mechanism for TV Recommendation
2023cites this paper
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems
2023cites this paper
IPOC: An Adaptive Interval Prediction Model based on Online Chasing and Conformal Inference for Large-Scale Systems
2023cites this paper
Automatic Web Services Recommendations using the Robust Deep Learning Approach
2023cites this paper
Collaborative filtering recommendation system based on improved Jaccard similarity
2023cites this paper
Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
2023cites this paper
A Systematic Study on Reproducibility of Reinforcement Learning in Recommendation Systems
2023cites this paper
A dynamic trust management model for vehicular ad hoc networks
2023cites this paper
Human-inspired framework to accelerate reinforcement learning
2023cites this paper
Learning to Distinguish Multi-User Coupling Behaviors for TV Recommendation
2023cites this paper
Reinforcing User Retention in a Billion Scale Short Video Recommender System
2023cites this paper
Deep reinforcement learning in recommender systems: A survey and new perspectives
2023cites this paper
Behavioral recommendation engine driven by only non-identifiable user data
2023cites this paper
AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement
2023cites this paper
A Safer Approach to Build Recommendation Systems on Unidentifiable Data
2022cites this paper
Reversible Data Hiding for Color Images Based on Adaptive 3D Prediction-Error Expansion and Double Deep Q-Network
2022cites this paper
NARS vs. Reinforcement learning: ONA vs. Q-Learning
2022cites this paper
Modern Value Based Reinforcement Learning: A Chronological Review
2022cites this paper
Surrogate for Long-Term User Experience in Recommender Systems
2022cites this paper
Deep Reinforcement Learning for Dynamic Recommendation with Model-agnostic Counterfactual Policy Synthesis
2022cites this paper
Recommender System using Reinforcement Learning: A Survey
2022cites this paper
Dynamic Regret of Online Markov Decision Processes
2022cites this paper
Controlling Underestimation Bias in Reinforcement Learning via Quasi-median Operation
2022cites this paper
ACP based reinforcement learning for long-term recommender system
2022cites this paper
Kernelized Deep Learning for Matrix Factorization Recommendation System Using Explicit and Implicit Information
2022cites this paper
PrefRec: Preference-based Recommender Systems for Reinforcing Long-term User Engagement
2022cites this paper
Multi-Objective Deep Reinforcement Learning for Recommendation Systems
2022cites this paper
Deep-Learing based Recommendation System Survey Paper
2022cites this paper
Open-environment machine learning
2022cites this paper
Constrained Reinforcement Learning for Short Video Recommendation
2022cites this paper
PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement
2022cites this paper
Locality-Sensitive State-Guided Experience Replay Optimization for Sparse Rewards in Online Recommendation
2022influential citation
IIDQN: An Incentive Improved DQN Algorithm in EBSN Recommender System
2022cites this paper
State Encoders in Reinforcement Learning for Recommendation: A Reproducibility Study
2022influential citation
MINDSim: User Simulator for News Recommenders
2022cites this paper
Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation
2022cites this paper
YouTube, The Great Radicalizer? Auditing and Mitigating Ideological Biases in YouTube Recommendations
2022cites this paper
A bibliometric analysis and review on reinforcement learning for transportation applications
2022cites this paper
Learning Long-Term Reward Redistribution via Randomized Return Decomposition
2021cites this paper
Reinforcement Learning based Recommender Systems: A Survey
2021influential citation
Deep Reinforcement Learning-Based Product Recommender for Online Advertising
2021cites this paper
Tag-Aware Recommender System Based on Deep Reinforcement Learning
2021cites this paper
Recommendation System with Reasoning Path Based on DQN and Knowledge Graph
2021cites this paper
Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference
2021cites this paper
IR-Rec: An interpretive rules-guided recommendation over knowledge graph
2021cites this paper
A Load Balanced Recommendation Approach
2021cites this paper
Exploring Clustering-Based Reinforcement Learning for Personalized Book Recommendation in Digital Library
2021cites this paper
Deep Reinforcement Learning based Group Recommender System
2021cites this paper
DARES: An Asynchronous Distributed Recommender System Using Deep Reinforcement Learning
2021cites this paper
A Survey on Deep Reinforcement Learning for Data Processing and Analytics
2021cites this paper
Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation
2021cites this paper
Dynamic spectrum access with deep Q-learning in densely occupied and partially observable environments
2021cites this paper
A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions
2021cites this paper
A Survey on Reinforcement Learning for Recommender Systems
2021cites this paper
Top-aware recommender distillation with deep reinforcement learning
2021cites this paper
Visual Analysis of Deep Q-network
2021cites this paper
On the Estimation Bias in Double Q-Learning
2021cites this paper
Optimized Recommender Systems with Deep Reinforcement Learning
2021cites this paper
Multi-Agent RL Enables Decentralized Spectrum Access in Vehicular Networks
2021cites this paper
Locality-Sensitive Experience Replay for Online Recommendation
2021cites this paper
Is High Variance Unavoidable in RL? A Case Study in Continuous Control
2021cites this paper
Sequential Advertising Agent with Interpretable User Hidden Intents
2020cites this paper
Knowledge-guided Deep Reinforcement Learning for Interactive Recommendation
2020cites this paper
Jointly Learning to Recommend and Advertise
2020cites this paper
Attribute-aware multi-task recommendation
2020cites this paper
Multi-Channel Sellers Traffic Allocation in Large-scale E-commerce Promotion
2020cites this paper
Cooperative Multi-Agent Reinforcement Learning in Express System
2020cites this paper
Bias and Debias in Recommender System: A Survey and Future Directions
2020cites this paper
Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems
2020cites this paper
Learning to Infer User Hidden States for Online Sequential Advertising
2020cites this paper
Improved Stochastic Synapse Reinforcement Learning for Continuous Actions in Sharply Changing Environments
2020cites this paper
Incremental preference adjustment: a graph-theoretical approach
2020cites this paper