Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 12:20
27 Oct 2020

Temporal information is a significant cue for recognizing human actions from videos. Different from 2D CNN which can only capture spatial information in an efficient way, 3D CNN is good at capturing both spatial and temporal information at the expense of high computational cost. Beyond both methods, this paper presents a Grouped Temporal Enhancement (GTE) module which even outperforms 3D CNN, meanwhile only needs similar low computational cost as 2D CNN. The GTE module firstly decomposes an input video into spatial and temporal groups along channel dimension, and then uses a learnable temporal shift (LTS) operation for efficient temporal modeling. Finally, a 2D convolution filter is used to enhance the ability of LTS for spatial modeling. Extensive experiments on three benchmark datasets validate the effect of our method.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00