HIERARCHICAL SPATIAL-TEMPORAL TRANSFORMER WITH MOTION TRAJECTORY FOR INDIVIDUAL ACTION AND GROUP ACTIVITY RECOGNITION
Xiaolin Zhu (Xiangtan University); Dongli Wang (Xiangtan University); Yan ZHOU (Xiangtan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Group activity recognition, which aims to simultaneously understand individual action and group activity in video clips, plays a fundamental role in video analysis. In this paper, we propose a novel reasoning network, Hierarchical Spatial-Temporal Transformer termed HSTT, for individual action and group activity recognition, which focuses on capturing the various degrees of spatial-temporal dynamic interactions adaptively and jointly among actors. Specifically, we first
design a hierarchical spatial-temporal Transformer by capturing different levels of relationships to deal with unequal interaction relationships among actors. Furthermore, our proposed spatial-temporal Transformer (STT) block is capable of
fully mining long-range spatial-temporal interactions with the virtue of the merge function and cross attention mechanism. Besides, we adopt the motion trajectory branch to provide complementary dynamic features for improving recognition
performance.