EFFICIENT TEMPORAL-SPATIAL FEATURE GROUPING FOR VIDEO ACTION RECOGNITION

Zhikang Qiu, Xu Zhao, Zhilan Hu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 10:24

28 Oct 2020

Temporal information plays an important role in action recognition. Recently, 3D CNN is widely used in extracting temporal features from videos. Compared to 2D CNN, 3D CNN has more parameters and brings heavy computation burden. It is necessary to improve the efficiency of action recognition. In this paper, inspired by group convolution and convolution kernel decomposition, we propose a novel module called grouped decomposed module (GDM) which separates channels into three groups and applies 3D, 2D and 1D convolution in parallel respectively. This module extracts spatial and temporal features efficiently. Based on GDM, we design a new network named grouped decomposed network (GDN). The grouped decomposed network achieves state-of-the-art performance on two temporal-related datasets (Something-Something V1&V2) but requires few parameters and FLOPs.

Tags:

sps conference

icip 2020

EFFICIENT TEMPORAL-SPATIAL FEATURE GROUPING FOR VIDEO ACTION RECOGNITION

Zhikang Qiu, Xu Zhao, Zhilan Hu

Value-Added Bundle(s) Including this Product

ICIP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society