REUSE NON-TERRAIN POLICIES FOR LEARNING TERRAIN-ADAPTIVE HUMANOID LOCOMOTION SKILLS
Hao Yu, Yuchen Liang, Yuehu Liu, Chi Zhang
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Terrain-adaptive locomotion skills are prerequisites for humanoid to traverse complex environments including barriers, gaps, caves and etc. Previous imitation learning methods require large amounts of motion sampling for an optimal policy adapted to the new terrain. However, we reveal an interesting fact: the effectiveness of such a costly policy on certain complex terrain is quite similar to that of existing policies pretrained on the flat ground. Inspired by this finding, we present a few-shot imitation learning (FSIL) framework to reuse these pretrained policies as learning primitives for new terrain adaptation. Specifically, a meta-controller, trained with a few control sequences, is proposed to allocate combination weights of each primitive for different terrains. Empirical studies in mainstream problem settings show that our method maintains a high passing rate within a few shots of new terrain while avoiding massive unnecessary data sampling. Further in-depth theoretical analysis shows that, compared to other methods that sample data in new terrain environments hundreds or thousands of times, our method requires only a maximum of 52 shots to achieve optimal control policy for certain terrain adaptation.