High-Order Joint Information Input For Graph Convolutional Network Based Action Recognition
Wen-Nung Lie, Yong-Jhu Huang, Jui-Chiu Chiang, Zhen-Yu Fang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:12
Graph Convolution Network (GCN)-based networks for human action recognition, accepting 3D skeleton sequence as input, have gained much attention and good performances recently. In this paper, joint information enhanced with rich higher-order features/attributes is proposed to lift up their recognition performances. All joints in a spatio-temporal skeleton are described in terms of a set of 3-component vectors by referring up to 3 joint neighbors in the spatio-temporal domain. The referred joints are physically connected in spatial or corresponded in temporal domain. Our rich high-order joint information is fed as inputs to two kinds of GCN-based networks in two ways: early fusion and late fusion. Early fusion is to concatenate these 3-components vectors as different channels at input nodes and late fusion is to feed each 3-component vector to a multi-stream GCN network separately and then fuse the output from each stream for action recognition decision. We also propose to cascade a view-adaptive (VA) sub-network to further promote the performance. Experiments show that our approach is capable of boosting the accuracy of original GCN networks in both early or late fusion styles by up to 1.57% and 2.55%, respectively (in cross-subject (CS) protocol) when using NTU RGB-D 60 dataset for evaluations.