Decomposition, Interaction, Reconstruction Meets Global Context Learning in Visual Tracking

Huibin Tan (NUDT); Kun Hu (National University of Defense Technology); Mingyu Cao (NUDT); Mengzhu Wang (NUDT); liyang xu (National University of Defense Technology); Wenjing Yang (National University of Defense Technology)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

Tensor decomposition and reconstruction attention is a promising global context learning approach because it can remain efficient while avoiding feature compression. To exploit its potential even further in visual tracking, we redesign a 3D tensor modeling paradigm, namely tensor Decomposition, Interaction, Reconstruction attention (DIR), respectively corresponding to three function components, Tensor Decomposition Module (TDM), Tensor Interaction Module (TIM) and Context Reconstruction Module (CRM). Specifically, TDM decomposes a 3D tensor feature into rank-1 context fragments in different dimension views. The ingenuity here lies in the introduction of Circular Convolution for processing features at arbitrary scales and channel-sharing segments to enhance the interaction of the two branches in the Siamese network architecture. TIM obtains the tensor planes of each dimension by the Cross-Similarity operation of rank-1 tensors and fused cubic features, which brings more interactions between all feature dimensions. CRM reconstructs 3D context representations with the outputs of the above modules. In experiments, DIR is embedded into the tracker to verify its effectiveness.

Tags:

Imaging and video networks