Feature Space Disentangling Based On Spatial Attention For Makeup Transfer
Jinli Zhou, Yaxiong Chen, Zhaoyang Sun, Chang Zhan, Feng Liu, Shengwu Xiong
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:27
This paper presents human-centric image retrieval with gaze-based image captioning. Although the development of cross-modal embedding techniques has enabled advanced image retrieval, many methods have focused only on the information obtained from the contents such as image and text. For further extending the image retrieval, it is necessary to construct retrieval techniques that directly reflect human intentions. in this paper, we propose a new retrieval approach via image captioning based on gaze information by focusing on the fact that the gaze information obtained from humans contains semantic information. Specifically, we construct a transformer, connect caption and gaze trace (CGT) model that learns the relationship among images, captioning provided by humans and gaze traces. Our CGT model enables transformer-based learning by dividing the gaze traces into some bounding boxes, and thus, gaze-based image captioning becomes feasible. By using the obtained captioning for cross-modal retrieval, we can achieve human-centric image retrieval. The technical contribution of this paper is transforming the gaze trace into the captioning via the transformer-based encoder. in the experiments, by comparing the cross-modal embedding method, the effectiveness of the proposed method is proved.