Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:13:30
11 Jun 2021

Few-shot learning can recognize a novel category based on only a few samples because it learns to learn from a lot of labeled samples during the training process. When data is insufficient, the performance is affected. And it is expensive to obtain a large-scale fine-grained dataset with annotation. In this paper, we adopt domain-specific knowledge to fill the gap of insufficient annotated data. We propose a cross-modal knowledge distillation (CMKD) framework to do fine-grained one-shot classification and propose the Spatial Relation Loss (SRL) to transfer cross-modal information, which can tackle the semantic gap between multimodal features. The teacher network distills the spatial relationship of the samples as a soft target for training a unimodal student network. Notably, the student network makes predictions only based on a few samples without any external knowledge in the application. This model-agnostic framework will be well adapted to other few-shot models. Extensive experimental results on benchmarks demonstrate that CMKD can make full use of cross-modal knowledge in image and text few-shot classification. CKMD improves the performances of the student networks significantly, even if it is a state-of-the-art student network.

Chairs:
Chaker Larabi