Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 09:55
07 Jul 2020

Visual Question Answering (VQA) is a comprehensive task to answer questions about the visual contents of an image. However, a number of studies have pointed out that VQA models rely heavily on superficial correlations between question and answer, and predict the answer just according to the textual statistical correlations without truly understanding the visual contents. To address this issue, we propose an answer re-ranking VQA model, called as RankVQA, in which the roles of the input image are re-examined to select the most relevant answer from a set of candidate answers generated by a typical VQA model. Specifically, we rank the candidate answers with their relevance to visual content of the input image and some question-related image captions respectively. Extensive experiments on the two datasets, i.e., VQA v2 and VQA-CP v2, demonstrate the effectiveness of the proposed model, and the state-of-the-art performance on both the datasets are achieved.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00