Distilling Detr-Like Detectors With instance-Aware Feature
Honglie Wang, Jian Xu, Shouqian Sun
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:05:28
Human Pose Estimation (HPE) is a long-standing yet challenging task in computer vision. The nature of the problem requires comprehensive global contextual reasoning among joints in different locations. in this work, we explore how to incorporate two popular and effective concepts, self-attention and Graph Neural Network (GNN), to model long-range information in HPE. Three different ways to implement self-attention in 3D feature maps are studied, where the best result is achieved via the channel-position version. Accuracy is further improved by refining the queries via an efficient channel-wise parallel GNN that explicitly models the human joint graphical relationships. We are able to improve prediction accuracy on strong baseline models and achieve state-of-the-art results.