Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:12:35
10 Jun 2021

Visual Question Answering (VQA) is a challenging task which requires a fine-grained semantic understanding of visual and textual contents. Existing works focus on better modality representations. However, these methods give little consideration to the long-tailed data distribution in common VQA datasets. The extreme class imbalance causes training bias to behave well in head class, but fail in tail class. Therefore, we propose a unified Adaptive Re-balancing Network (ARN) to take care of classification in both head and tail classes, exhaustively improving performance for VQA. Specifically, two training branches are introduced to perform their own duty iteratively, which learn the universal representations first and then emphasize the tail data progressively by the re-balancing branch with adaptive learning. Meanwhile, contextual information in the question is vital for guiding accurate visual attention. Thus our network is further equipped with a novel gate mechanism to give higher weight to contextual information. The Experimental results on common benchmarks such as VQA-v2 have demonstrated the superiority of our method compared with state of the art.

Chairs:
Zheng-Hua Tan

Value-Added Bundle(s) Including this Product

More Like This