VC-VQA: VISUAL CALIBRATION MECHANISM FOR VISUAL QUESTION ANSWERING

Yanyuan Qiao, Zheng Yu, Jing Liu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 08:25

27 Oct 2020

Visual Question Answering (VQA) is a comprehensive task to answer questions about the visual contents of an image. Recently, a number of studies have pointed out that VQA models tend to be misled by the dataset biases, and rely heavily on the superﬁcial correlations between question and answer, rather than really understanding the visual contents. To address this issue, we propose visual calibration mechanism for VQA(VC-VQA) which extends the conventional VQA model with an additional image feature reconstruction module. The proposed model reconstructs image features based on predicted answer with question and measures the similarity between reconstructed image feature and original image feature, which will guide the VQA model predict the ﬁnal answer. We evaluate our model on both VQA v1 and VQA v2 datasets, showing that VC-VQA effectively reduces impacts of dataset bias and achieves competitive performance compared to other mainstream methods.

Tags:

sps conference

icip 2020