Complex Spatial-Temporal Attention Aggregation for Video Person Re-Identification
Wenjie Ding, Xing Wei, Xiaopeng Hong, Yihong Gong
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 08:19
Video-based person re-identification (Re-ID) aims to match pedestrian tracklets of the same identity captured by different cameras. Existing works usually compute the video-level feature representation via simple frame-level feature aggregation, such as average pooling and max pooling. However, the performance of such methods degenerates severely under low signal-noise ratio and partial occlusions. In this paper, we propose a novel Complex Spatial-Temporal Attention Aggregation (CAA), which fully exploits the discriminative information in spatial-temporal dimension via the combination of two aggregation method, namely region-aware aggregation and region-regardless aggregation. We evaluate the proposed method in three widely used video Re-ID datasets, including MARS, iLIDS-VID, and PRID-2011. The experimental results demonstrate that the proposed method outperforms the state of the arts.