Global And Local Discriminative Patches Exploiting For Action Recognition

Jintao Wu, Wu Luo, Weiwei Liu, Chongyang Zhang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 15:33

04 May 2020

Recent human action recognition models mainly focus on exploiting human features, such as pose or skeleton features. However, due to the ignoring of interactive or related scenes exploiting, most of these methods cannot achieve good enough performance. In this work we propose a novel multi-stream features fusion framework based on discriminative patches exploiting. Unlike existing two-stream frameworks, part-based or attention based multi-stream methods, our work improves the recognition accuracy by: 1) Paying more attention on exploiting of global and local discriminative patches, which include not only the acting human, but also the interactive scenes. 2) Proposing an effective multi-stream feature pooling and fusion mechanism: 2D and 3D ConvNets features from RGB images and discriminative patches are combined together to enhance spatial-temporal feature presentation ability. Our framework is evaluated on two widely used video action benchmarks, where it outperforms other state-of-the-art recognition approaches: the accuracy up to 87.8% at HMDB51, and 98.8% at UCF101, respectively.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020