GAITCOTR: improved spatial-temporal representation for gait recognition with a hybrid convolution-transformer framework
Jingqi Li (Fudan University); Yuzhen Zhang (Fudan University); Hongming Shan (Fudan University); Junping Zhang (Fudan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This work presents a novel hybrid convolution-transformer framework for gait recognition, termed GaitCoTr. The developed framework captures the appearance and short-term motion features by CNN-based GaCo and extracts the long-term motion features relied upon ViT-variant GaTr, achieving more comprehensive spatial-temporal representation of gait. To unleash the potential of this hybrid framework and extract richness and generalized motion features, we propose a new variant of transformer tailored for gait named GaTr, including temporally shifted tokenizer, length-flexible position embedding, and inter-frame encoder. In addition, we introduce an auxiliary task---view label prediction---aiming to disentangle view from ID information. Extensive experimental results on two well-known gait benchmark datasets, CASIA-B and OU-MVLP, demonstrate the superior performance of the proposed GaitCoTr.