Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:12:41
03 Oct 2022

Few-shot action recognition aims to learn novel action classes with only a few annotated samples. This is a challenging problem because motion modeling is difficult, especially when a few training samples are available. Visual tempo, which is an essential variation factor of video semantics, characterizes the dynamic motion information in action videos. in this work, we propose a visual tempo contrastive learning framework (VTCL) to tackle the few-shot action recognition problem. Specifically, we propose a visual tempo encoding (VTE) module for visual tempo learning. The VTE module samples the same action instance at different frame sampling rates to obtain different visual tempo encoding vectors, which jointly form the features of each instance. To enhance the discriminability of visual tempo encoding vectors, we propose a visual tempo contrastive encoding (VTCE) loss to promote intra-class compactness and inter-class difference of visual tempo encoding vectors. Extensive experiments demonstrate that the proposed VTCL achieves promising results among the competing state-of-the-art methods on three few-shot action recognition datasets, including Kinetics, UCF101 and HMDB51.