Feature Space Disentangling Based On Spatial Attention For Makeup Transfer

Jinli Zhou, Yaxiong Chen, Zhaoyang Sun, Chang Zhan, Feng Liu, Shengwu Xiong

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:27

03 Oct 2022

This paper presents human-centric image retrieval with gaze-based image captioning. Although the development of cross-modal embedding techniques has enabled advanced image retrieval, many methods have focused only on the information obtained from the contents such as image and text. For further extending the image retrieval, it is necessary to construct retrieval techniques that directly reflect human intentions. in this paper, we propose a new retrieval approach via image captioning based on gaze information by focusing on the fact that the gaze information obtained from humans contains semantic information. Specifically, we construct a transformer, connect caption and gaze trace (CGT) model that learns the relationship among images, captioning provided by humans and gaze traces. Our CGT model enables transformer-based learning by dividing the gaze traces into some bounding boxes, and thus, gaze-based image captioning becomes feasible. By using the obtained captioning for cross-modal retrieval, we can achieve human-centric image retrieval. The technical contribution of this paper is transforming the gaze trace into the captioning via the transformer-based encoder. in the experiments, by comparing the cross-modal embedding method, the effectiveness of the proposed method is proved.

Tags:

International Conference on Image Processing

IEEE ICIP 2022

icip

Feature Space Disentangling Based On Spatial Attention For Makeup Transfer

Jinli Zhou, Yaxiong Chen, Zhaoyang Sun, Chang Zhan, Feng Liu, Shengwu Xiong

Value-Added Bundle(s) Including this Product

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

More Like This

Speaker Extraction With Co-Speech Gestures Cue

Diverse Generative Perturbations On Attention Space For Transferable Adversarial Attacks

Convolutional Neural Tree For Video-Based Facial Expression Recognition Embedding Emotion Wheel As inductive Bias

Join the IEEE Signal Processing Society