COMPLEMENTARY LEARNING SYSTEM BASED INTRINSIC REWARD IN REINFORCEMENT LEARNING
Zijian Gao (National University of Defense Technology); Kele Xu (National Key Laboratory of Parallel and Distributed Processing (PDL)); Hongda Jia (National University of Defense Technology); Tianjiao Wan (National University of Defense Technology); Ding Bo (National University of Defense Technology); Dawei Feng (National University of Defense Technology); Xinjun Mao (National University of Defense Technology); Huaimin Wang (National University of Defense Technology)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Deep reinforcement learning has achieved encouraging performance in many realms. However, one of its primary challenges is the sparsity of extrinsic rewards, which is still far from solved. Complementary learning system theory suggests that effective human learning relies on two complementary learning systems utilizing short-term and long-term memories. Inspired by the fact that humans evaluate curiosity by comparing current observations with historical information, we propose a novel intrinsic reward, namely CLS-IR, which aims to address the problems caused by sparse extrinsic rewards. Specifically, we train a self-supervised predictive model with short-term and long-term memories via exponential moving averages. We employ the information gain between the two memories as the intrinsic reward, which does not incur additional training costs but leads to better exploration. To investigate the effectiveness of CLS-IR, we conduct extensive experimental evaluations; the results demonstrate that CLS-IR can achieve state-of-the-art performance on Atari games and DeepMind Control Suite.