Boosting Fine-grained Sketch-based Image Retrieval with Self-supervised Learning
Zhaolong Zhang (Fudan University); Yangdong Chen (Fudan University); Yuejie Zhang (Fudan University); Rui Feng (Fudan University); Tao Zhang (Shanghai University of Finance and Economics)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Fine-grained sketch-based image retrieval (FG-SBIR) aims at aligning images and sketches at the instance level. It is a challenging task as there are significant differences between sketch and image. Existing methods usually produce less desired performance due to the lack of large-scale fine-grained image-sketch datasets and the strong dependence on the classification models pretrained on ImageNet. In this paper, we propose a better self-supervised pre-trained FG-SBIR model which does not depend on large-scale annotated datasets. Only images and their corresponding edge maps are used at the pre-training stage. Mixed modal transformation is designed to generate different mixed-up views. The FG-SBIR model is pre-trained by minimizing the distance between the views of the same instance and then fine-tuned by a simple triplet loss. With a plain downstream network, it achieves generally better performance than state-of-the-art models on three widely used FG-SBIR datasets.