MTFD : Multi-teacher Fusion Distillation For Compressed Video Action Recognition

Jinxin Guo (Inner Mongolia University); Jiaqiang Zhang (Inner Mongolia University); Shaojie Li (Inner Mongolia University); Xiaojing Zhang (Inner Mongolia University); Ming Ma (Inner Mongolia University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

As an important work in computer vision, some recent representative works such as Two-stream networks, 3D ConvNets, and Transformer-based networks have achieved outstanding performance. However, due to the high computational cost, the explosion of computation time and parameters, they cannot meet the needs of real-time applications. The current work utilizes the keyframes and motion information retained by compressed video for computation, which greatly reduces the computational effort but still cannot satisfy real-time applications. Therefore, we propose a multi-teacher fusion distillation framework for compressed video action recognition (MTFD). Unlike the traditional method of transferring the knowledge of single or multiple teachers directly into the student model, we also perform knowledge transfer between teachers. MTFD achieves better knowledge distillation through mutual guidance and information fusion between teachers. Furthermore, we improve the network's ability to extract motion information, which ultimately reduces the computational effort while maintaining high accuracy.

Tags:

Image and video representation