ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation
Wenxi Ma (Xiamen University); Tianxiang Hou (Xiamen University); Qianji Di (Xiamen University); Zhongang Qi (Tencent); Ying Shan (Tencent); Hanzi Wang (Xiamen University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The scene graph generation (SGG) task has attracted increasing attention in recent years. The goal of SGG is to predict relations between pairs of objects within an image. Due to the long-tailed distribution of the dataset annotations, the performance of SGG is still far from satisfactory. To address the long-tailed problem, existing methods try various ways to conduct unbiased learning. However, we argue that the essence of the long-tailed problem in SGG is that the classifier is seriously affected by the long-tailed data. To handle this issue, we propose a novel network named ERBNet, which contains a relation feature fusion (RFF) encoder to construct effective representations of relations between objects, and a nearest class mean (NCM) classifier to conduct relation prediction based on relation feature similarities. Extensive experimental results show that the proposed ERBNet outperforms several state-of-the-art methods on the challenging Visual Genome dataset.