Global And Local Discriminative Patches Exploiting For Action Recognition
Jintao Wu, Wu Luo, Weiwei Liu, Chongyang Zhang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 15:33
Recent human action recognition models mainly focus on exploiting human features, such as pose or skeleton features. However, due to the ignoring of interactive or related scenes exploiting, most of these methods cannot achieve good enough performance. In this work we propose a novel multi-stream features fusion framework based on discriminative patches exploiting. Unlike existing two-stream frameworks, part-based or attention based multi-stream methods, our work improves the recognition accuracy by: 1) Paying more attention on exploiting of global and local discriminative patches, which include not only the acting human, but also the interactive scenes. 2) Proposing an effective multi-stream feature pooling and fusion mechanism: 2D and 3D ConvNets features from RGB images and discriminative patches are combined together to enhance spatial-temporal feature presentation ability. Our framework is evaluated on two widely used video action benchmarks, where it outperforms other state-of-the-art recognition approaches: the accuracy up to 87.8% at HMDB51, and 98.8% at UCF101, respectively.