COMPLEMENTARY LEARNING SYSTEM BASED INTRINSIC REWARD IN REINFORCEMENT LEARNING

Zijian Gao (National University of Defense Technology); Kele Xu (National Key Laboratory of Parallel and Distributed Processing (PDL)); Hongda Jia (National University of Defense Technology); Tianjiao Wan (National University of Defense Technology); Ding Bo (National University of Defense Technology); Dawei Feng (National University of Defense Technology); Xinjun Mao (National University of Defense Technology); Huaimin Wang (National University of Defense Technology)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Deep reinforcement learning has achieved encouraging performance in many realms. However, one of its primary challenges is the sparsity of extrinsic rewards, which is still far from solved. Complementary learning system theory suggests that effective human learning relies on two complementary learning systems utilizing short-term and long-term memories. Inspired by the fact that humans evaluate curiosity by comparing current observations with historical information, we propose a novel intrinsic reward, namely CLS-IR, which aims to address the problems caused by sparse extrinsic rewards. Specifically, we train a self-supervised predictive model with short-term and long-term memories via exponential moving averages. We employ the information gain between the two memories as the intrinsic reward, which does not incur additional training costs but leads to better exploration. To investigate the effectiveness of CLS-IR, we conduct extensive experimental evaluations; the results demonstrate that CLS-IR can achieve state-of-the-art performance on Atari games and DeepMind Control Suite.

Tags:

Deep learning techniques

COMPLEMENTARY LEARNING SYSTEM BASED INTRINSIC REWARD IN REINFORCEMENT LEARNING

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Training Robust Spiking Neural Networks with ViewPoint Transform and SpatioTemporal Stretching

Adaptive Scale and Spatial Aggregation for Real-time Object Detection

INTER-SCALE SURE-LET DENOISE WITH STRUCTURED DEEP IMAGE PRIOR: INTERPRETABLE SELF-SUPERVISED LEARNING

Join the IEEE Signal Processing Society