Skip to main content


Shaoqi Sun (National University of Defense Technology); Yuanzhao Zhai (National University of Defense Technology); Kele Xu (National Key Laboratory of Parallel and Distributed Processing (PDL)); Dawei Feng (National University of Defense Technology); Ding Bo (National University of Defense Technology)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Multi-Agent Reinforcement Learning (MARL) has recently achieved promising performance in many collaborative decision-making tasks. However, the sparse team reward is one of the bottleneck challenges for MARL, which may induce the homogenization of agents' behavior. To address these issues, we propose a Progressive Diversifying Policy (PDP) algorithm in this paper. Specifically, we propose to actively amplify the diversity between agents' policies during learning and exploit diversity as an additional intrinsic reward for MARL. Furthermore, we propose a progressive diversity boosting policy to find a better team policy. Leveraging the aforementioned improvements, our method can handle sparse team rewards and homogeneous behaviors of agents. We conduct experiments on several widely-used MARL environments. The results show that PDP can provide state-of-the-art performance while maintaining a competitive convergence speed.

More Like This