DATASET-LEVEL DIRECTED IMAGE TRANSLATION FOR CROSS-DOMAIN CROWD COUNTING

Xin Tan, Hiroshi Ishikawa

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 10 Oct 2023

Most crowd counting methods rely on a large amount of manually labeled data to train a supervised model. With the availability of synthetic dataset, one way to alleviate the scarcity of large-scale dataset is to use an image-to-image translation method to adapt synthetic data for training. However, previous methods focus on adapting local visual feature of the image, which leads to distorted and blurry translation results. In this paper, we propose a novel CLIP-guided image-to-image translation method, based on the observation that synthetic and real images can be easily separated in CLIP’s embedding space. We make use of the difference between two domains in the CLIP-space as a consistent guide to train an image translator. Then a crowd counting model is trained using images translated from synthetic data by the translator. Experiments on real-world crowd counting datasets demonstrate the effectiveness of the proposed method which enables the crowd counting model to achieve a state-of-the-art performance.

Tags:

crowd counting

domain adaptation

image-to-image translation

DATASET-LEVEL DIRECTED IMAGE TRANSLATION FOR CROSS-DOMAIN CROWD COUNTING

Xin Tan, Hiroshi Ishikawa

More Like This

Slides: Image-to-Image Translation: Methods and Applications

Image-to-Image Translation: Methods and Applications

PROGRESSIVE MIXUP AUGMENTED TEACHER-STUDENT LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION

Join the IEEE Signal Processing Society