TOP-K VISUAL TOKENS TRANSFORMER: SELECTING TOKENS FOR VISIBLE-INFRARED PERSON RE-IDENTIFICATION
Bin Yang (Wuhan University); Jun Chen (Wuhan University); Mang Ye (Wuhan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Visible modality and infrared modality person re-identification (VI-ReID) is an extremely important and challenging task. Existing works mainly focus on reducing the modality gap with Convolutional Neural Networks (CNN). However, the features extracted by CNN may contain useless identity-irrelevant information, which inevitably reduces the discrimination of features. To address this issue, this paper introduces a Top-K Visual Tokens Transformer (TVTR) framework which utilizes a top-k visual tokens selection module to accurately select top-k discriminative visual patches for reducing the distraction of identity-irrelevant information and learning discriminative features. Furthermore, a global-local circle loss is developed to optimize the TVTR for achieving cross-modality positive concentration and negative separation properties. The experimental results on SYSU-MM01 and RegDB datasets demonstrate the superiority of our method. The source code will be released.