Skip to main content

MODULATION-BASED CENTER ALIGNMENT AND MOTION MINING FOR SPATIAL TEMPORAL ACTION DETECTION

Weiji Zhao (Shanghai Jiao Tong University); KeFeng Huang (Shanghai Jianke Engineering Consulting Co.,Ltd); Chongyang Zhang (Shanghai Jiao Tong University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

The goal of spatial-temporal action detection is to generate spatial-temporally aligned action tubes. Most of the existing 2D CNN-based solutions directly aggregate temporal adjacent contexts through frames without alignment. The misaligned spatial-temporal contextual features might lead to chaotic representation and misaligned action instances. Moreover, most existing methods fail to efficiently exploit motion dependencies. In this paper, we propose Modulation-based Center Alignment (MCA) and Sparse Valuable Motion Mining (SVMM) for more accurate action detection: With deformable convolution, key-frame based modulation is firstly designed to align action center between temporal frames; then motion region guided sparse self-attention is developed for valuable motion mining. Our framework can outperform current 2D CNN-based methods significantly, based on the experimental result on two widely used benchmarks of JHMDB and UCF101-24.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00