SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

Hongcheng Zhang, Xu Zhao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:43

10 May 2022

Recognizing action patterns and detecting action instances are vital for spatial temporal action detection task, which aims to recognize the actions of interest in untrimmed videos and localize them in both space and time. The mainstream action tubelet detectors, however, ignore the conflicts in features between localization and classification, and use localization features for temporal modeling, which leads to ineffective action classification. In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. We design the Class-Agnostic Center Localization module to perform action instance center localization in the Class-Agnostic manner. Besides, Movement and Size Regression is proposed for movement estimation and spatial extent detection by using Gaussian kernels to encode training samples. These three modules work together to generate the tubelet detection results, which could be further linked to yield video-level tubes with a matching strategy. Our detector achieves the state-of-the-art performance in both frame-mAP and video-mAP metrics, on the UCF-24 and JHMDB datasets.

Tags:

video understanding

anchor-free detector

spatio-temporal action detection

video action detection

SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION

Hongcheng Zhang, Xu Zhao

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

ACTION ANTICIPATION WITH GOAL CONSISTENCY

EXPLORING DIFFUSION MODELS FOR UNSUPERVISED VIDEO ANOMALY DETECTION

Join the IEEE Signal Processing Society