Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:15:12
20 Sep 2021

Graph Convolution Network (GCN)-based networks for human action recognition, accepting 3D skeleton sequence as input, have gained much attention and good performances recently. In this paper, joint information enhanced with rich higher-order features/attributes is proposed to lift up their recognition performances. All joints in a spatio-temporal skeleton are described in terms of a set of 3-component vectors by referring up to 3 joint neighbors in the spatio-temporal domain. The referred joints are physically connected in spatial or corresponded in temporal domain. Our rich high-order joint information is fed as inputs to two kinds of GCN-based networks in two ways: early fusion and late fusion. Early fusion is to concatenate these 3-components vectors as different channels at input nodes and late fusion is to feed each 3-component vector to a multi-stream GCN network separately and then fuse the output from each stream for action recognition decision. We also propose to cascade a view-adaptive (VA) sub-network to further promote the performance. Experiments show that our approach is capable of boosting the accuracy of original GCN networks in both early or late fusion styles by up to 1.57% and 2.55%, respectively (in cross-subject (CS) protocol) when using NTU RGB-D 60 dataset for evaluations.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00