Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 0:10:02
28 Jun 2022

Lack of labeled data is an omnipresent issue in deep learning for computer vision. Generating synthetic data seems to offer a simple solution for this problem. Once a data generator is set up, arbitrary amounts of fully-labeled data can be created. This data is then used to train a neural network which finally solves a problem on real data, e.g. by detecting cars. We argue that leveraging synthetic data like that rarely works in practice. Synthetic images usually have a significant domain gap to real data, leading to reduced performance on the target domain. Furthermore, the costs of creating a synthetic data generator can be quite significant compared to manual labeling. In experiments on several synthetic-to-real benchmarks including Sim10k to CityScapes, we show that state-of-the-art domain adaptation methods trained on thousands of synthetic images are usually outperformed by ordinary supervised learning on 14 to 70 images from the target domain.

More Like This