Skip to main content

A Perturbation-based Policy Distillation Framework with Generative Adversarial Nets

LiHua Zhang (School of Computer Science and Technology, Soochow University); Quan Liu (School of Computer Science and Technology, Soochow University); Zhang Xiongzhen (School of Computer Science and Technology, Soochow University, Suzhou, China); Yapeng Xu (School of Computer Science and Technology, Soochow University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

We study the problem of imitation learning in automated decision systems, in which a learner is trained to imitate an expert demonstrator. A widely used method is adversarial imitation learning that alternately optimizes a generator (learner) and a discriminator (reward function). However, the discriminator is biased during the initial and intermediate training stages. Consequently, the gradient descent direction of the learner is misguided, which leads to unstable training and sample complexity. In this paper, we propose deep imitation learning through a guidance-based policy distillation (GIL) algorithm. First, GIL proposes a teacher model, the guidance-based variational autoencoder, which is pre-trained with expert demonstrations. Then, GIL proposes a perturbation-based policy distillation method that uses the teacher model to guide the learner in the correct optimization direction, enabling the learner to imitate the expert policy with fewer detours. The experimental results show that our approach achieve higher sample efficiency compared with multiple baselines.