DRONECAPS: RECOGNITION OF HUMAN ACTIONS IN DRONE VIDEOS USING CAPSULE NETWORKS WITH BINARY VOLUME COMPARISONS
Abdullah Algamdi, Victor Sanchez, Chang-Tsun Li
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 15:10
Understanding human actions from videos captured by drones is a challenging task in computer vision due to the unfamiliar viewpoints of individuals and changes in their size due to the camera’s location and motion. This work proposes DroneCaps, a capsule network architecture for multi-label human action recognition (HAR) on videos captured by drones. DroneCaps uses features computed by 3D convolution neural networks plus a new set of features computed by a novel Binary Volume Comparison layer. All these features, in conjunction with the learning power of CapsNets, allow understanding and abstracting the different viewpoints and poses of the depicted individuals very efficiently, thus improving multi-label HAR. The evaluation of the DroneCaps architecture’s performance for multi-label classification shows that it outperforms state-of-the-art methods on the Okutama-Action dataset.