Adaptive Detail injection-Based Feature Pyramid Network For Pan-Sharpening
Yi Sun, Yuanlin Zhang, Yuan Yuan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:41
Few-shot action recognition aims to learn novel action classes with only a few annotated samples. This is a challenging problem because motion modeling is difficult, especially when a few training samples are available. Visual tempo, which is an essential variation factor of video semantics, characterizes the dynamic motion information in action videos. in this work, we propose a visual tempo contrastive learning framework (VTCL) to tackle the few-shot action recognition problem. Specifically, we propose a visual tempo encoding (VTE) module for visual tempo learning. The VTE module samples the same action instance at different frame sampling rates to obtain different visual tempo encoding vectors, which jointly form the features of each instance. To enhance the discriminability of visual tempo encoding vectors, we propose a visual tempo contrastive encoding (VTCE) loss to promote intra-class compactness and inter-class difference of visual tempo encoding vectors. Extensive experiments demonstrate that the proposed VTCL achieves promising results among the competing state-of-the-art methods on three few-shot action recognition datasets, including Kinetics, UCF101 and HMDB51.