Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 15:33
04 May 2020

Recent human action recognition models mainly focus on exploiting human features, such as pose or skeleton features. However, due to the ignoring of interactive or related scenes exploiting, most of these methods cannot achieve good enough performance. In this work we propose a novel multi-stream features fusion framework based on discriminative patches exploiting. Unlike existing two-stream frameworks, part-based or attention based multi-stream methods, our work improves the recognition accuracy by: 1) Paying more attention on exploiting of global and local discriminative patches, which include not only the acting human, but also the interactive scenes. 2) Proposing an effective multi-stream feature pooling and fusion mechanism: 2D and 3D ConvNets features from RGB images and discriminative patches are combined together to enhance spatial-temporal feature presentation ability. Our framework is evaluated on two widely used video action benchmarks, where it outperforms other state-of-the-art recognition approaches: the accuracy up to 87.8% at HMDB51, and 98.8% at UCF101, respectively.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00