Skip to main content

Learning from the raw domain: cross modality distillation for compressed video action recognition

Yufan Liu (Institute of Automation, Chinese Academy Sciences); Jiajiong Cao (Ant Financial Service Group); Weiming Bai (Chinese Academy of Sciences); Bing Li (National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences); Weiming Hu (Institute of Automation,Chinese Academy of Sciences)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Video action recognition is faced with the challenges of both huge computation burden and performance requirements. Using compressed domain data, which saves much decoding computation, is a possible solution. Unfortunately, existing compressed-domain-based (CD) methods fail to obtain high performance, compared with state-of-the-art (SOTA) raw-domain-based (RD) methods. In order to solve the problem, we propose a cross-modality knowledge distillation method to force the CD model to learn the knowledge from the RD model. In particular, spatial knowledge and temporal knowledge are first constructed to align feature space between the raw domain and the compressed domain. Then, an adaptively multi-path knowledge learning scheme is presented to help the CD model learn in a more efficient way. Experiments verify the effectiveness of the proposed method in large-scale and small-scale datasets.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00