Learning Spatio-Temporal Convolutional Network For Real-Time Object Tracking
Hanzao Chen, Xiaofen Xing, Xiangmin Xu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 16:28
Siamese series of tracking networks have shown great potentials in achieving balanced accuracy and beyond real-time speed. However, most of existing siamese trackers only consider appearance features of first frame, and hardly benefit from interframe information. The lack of latest temporal transformation degrades the tracking performance during challenges such as deformation and partial occlusion. In this paper we focus on making using of the rich information in latest consecutive frames to improve the feature representation of initial template frame. Specifically, the latest frames after 3d convolution are used to generate an attention map, which is then point-wise multiplied by the features of first frame to obtain the updated template. With the attention map, the template can adaptively cope with the deformation and occlusion of the target. Since the first frame is always used as the basis of the template, there is no cumulative error when using the latest frames for attention. Due to the shared 2d convolution of all frames, the feature map results can be reused so that the added module has almost no time-consuming effects. This module is easily embedded into different siamese trackers. Through verification, the module has significantly improved tracking performance in different backbone situations.