MODULATION-BASED CENTER ALIGNMENT AND MOTION MINING FOR SPATIAL TEMPORAL ACTION DETECTION

Weiji Zhao (Shanghai Jiao Tong University); KeFeng Huang (Shanghai Jianke Engineering Consulting Co.,Ltd); Chongyang Zhang (Shanghai Jiao Tong University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

The goal of spatial-temporal action detection is to generate spatial-temporally aligned action tubes. Most of the existing 2D CNN-based solutions directly aggregate temporal adjacent contexts through frames without alignment. The misaligned spatial-temporal contextual features might lead to chaotic representation and misaligned action instances. Moreover, most existing methods fail to efficiently exploit motion dependencies. In this paper, we propose Modulation-based Center Alignment (MCA) and Sparse Valuable Motion Mining (SVMM) for more accurate action detection: With deformable convolution, key-frame based modulation is firstly designed to align action center between temporal frames; then motion region guided sparse self-attention is developed for valuable motion mining. Our framework can outperform current 2D CNN-based methods significantly, based on the experimental result on two widely used benchmarks of JHMDB and UCF101-24.

Tags:

Image and video representation

MODULATION-BASED CENTER ALIGNMENT AND MOTION MINING FOR SPATIAL TEMPORAL ACTION DETECTION

Weiji Zhao (Shanghai Jiao Tong University); KeFeng Huang (Shanghai Jianke Engineering Consulting Co.,Ltd); Chongyang Zhang (Shanghai Jiao Tong University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SSGD: A smartphone screen glass dataset for defect detection

Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance

YOLOX-B: A BETTER YOLOX MODEL FOR REAL-TIME DRIVER BEHAVIOR DETECTION

Join the IEEE Signal Processing Society