GROUPED TEMPORAL ENHANCEMENT MODULE FOR HUMAN ACTION RECOGNITION

Hong Liu, Bin Ren, Mengyuan Liu, Runwei Ding

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:20

27 Oct 2020

Temporal information is a significant cue for recognizing human actions from videos. Different from 2D CNN which can only capture spatial information in an efficient way, 3D CNN is good at capturing both spatial and temporal information at the expense of high computational cost. Beyond both methods, this paper presents a Grouped Temporal Enhancement (GTE) module which even outperforms 3D CNN, meanwhile only needs similar low computational cost as 2D CNN. The GTE module firstly decomposes an input video into spatial and temporal groups along channel dimension, and then uses a learnable temporal shift (LTS) operation for efficient temporal modeling. Finally, a 2D convolution filter is used to enhance the ability of LTS for spatial modeling. Extensive experiments on three benchmark datasets validate the effect of our method.

Tags:

sps conference

icip 2020

GROUPED TEMPORAL ENHANCEMENT MODULE FOR HUMAN ACTION RECOGNITION

Hong Liu, Bin Ren, Mengyuan Liu, Runwei Ding

Value-Added Bundle(s) Including this Product

ICIP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society