Structural Reparameterization Lightweight Network for Video Action Recognition
AnLei Zhu (Jiangnan University); Wang Yinghui (Jiangnan University); Wei Li (Jiangnan University); Pengjiang Qian (Jiangnan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
3D convolution networks play an important role in extracting spatiotemporal features in video action recognition. However, it usually brings a large number of paramters, which results in deployment difficulty in edge devices with limited memory space. Although lightweight 3DCNNs can reduce the mode size significantly, it causes a serious loss of accuracy. This paper proposes a novel approach to reduce the model size while preserves accuracy by combining lightweight networks with structural reparameterization. To reduce the model size, we propose 3D-DBB module, based on 2D Diverse Branch Block (DBB). Furthermore, we propose three structures based on 3D-DBB: (1) 3D depthwise convolution (called 3D-DBB DepthWise), (2) the 3D pointwise convolution (called 3D DBB-PointWise), and (3) reparameterizable depthwise separable structure (called DP3DBB), which is the concatenation of the two previous structures. We design and compare the effect of two different replacements for replacing depthwise separable structures in lightweight networks. Our method achieves 93.33% with only 0.42% loss in accuracy when the model size is only 1/50 of that of 3D-ResNeXt101.