Skip to main content

COMPLEMENTARY LEARNING SYSTEM BASED INTRINSIC REWARD IN REINFORCEMENT LEARNING

Zijian Gao (National University of Defense Technology); Kele Xu (National Key Laboratory of Parallel and Distributed Processing (PDL)); Hongda Jia (National University of Defense Technology); Tianjiao Wan (National University of Defense Technology); Ding Bo (National University of Defense Technology); Dawei Feng (National University of Defense Technology); Xinjun Mao (National University of Defense Technology); Huaimin Wang (National University of Defense Technology)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Deep reinforcement learning has achieved encouraging performance in many realms. However, one of its primary challenges is the sparsity of extrinsic rewards, which is still far from solved. Complementary learning system theory suggests that effective human learning relies on two complementary learning systems utilizing short-term and long-term memories. Inspired by the fact that humans evaluate curiosity by comparing current observations with historical information, we propose a novel intrinsic reward, namely CLS-IR, which aims to address the problems caused by sparse extrinsic rewards. Specifically, we train a self-supervised predictive model with short-term and long-term memories via exponential moving averages. We employ the information gain between the two memories as the intrinsic reward, which does not incur additional training costs but leads to better exploration. To investigate the effectiveness of CLS-IR, we conduct extensive experimental evaluations; the results demonstrate that CLS-IR can achieve state-of-the-art performance on Atari games and DeepMind Control Suite.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00