Parting With Illusions About Synthetic Data

Daniel Pototzky

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 0:10:02

28 Jun 2022

Lack of labeled data is an omnipresent issue in deep learning for computer vision. Generating synthetic data seems to offer a simple solution for this problem. Once a data generator is set up, arbitrary amounts of fully-labeled data can be created. This data is then used to train a neural network which finally solves a problem on real data, e.g. by detecting cars. We argue that leveraging synthetic data like that rarely works in practice. Synthetic images usually have a significant domain gap to real data, leading to reduced performance on the target domain. Furthermore, the costs of creating a synthetic data generator can be quite significant compared to manual labeling. In experiments on several synthetic-to-real benchmarks including Sim10k to CityScapes, we show that state-of-the-art domain adaptation methods trained on thousands of synthetic images are usually outperformed by ordinary supervised learning on 14 to 70 images from the target domain.

Tags:

IVMSP 2022

June 2022

2022

IVMSP

IEEE IVMSP 2022

June 26

Nafplio

Parting With Illusions About Synthetic Data

Daniel Pototzky

More Like This

Short Course Bundle: ICASSP 2022 COURSE 6: Transformer Architectures for Multimodal Signal Processing and Decision Making (Parts 1-3)

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

Short Course Bundle: ICASSP 2022 COURSE 3: Biomedical Signal Analysis and Healthcare (Parts 1-3)

Join the IEEE Signal Processing Society