Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:07:50
22 Sep 2021

For scene graph generation, it is crucial to properly understand the relationships of objects within the context of the image. We design a label transformation method using a Transformer-VAE (Variational Autoencoder) structure, which converts bounding box labels into auxiliary labels that contain each object's context in an unsupervised manner. The auxiliary labels are then trained jointly with bounding box labels and relation labels in a multi-task way. Our approach does not require any external datasets or language prior and is applicable to any graph generation models that infer the relationship between pairs of objects. We validate our method's effectiveness and scalability with state-of-the-art scene graph generation models on VRD and VG datasets.

Value-Added Bundle(s) Including this Product

More Like This