VC-VQA: VISUAL CALIBRATION MECHANISM FOR VISUAL QUESTION ANSWERING
Yanyuan Qiao, Zheng Yu, Jing Liu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 08:25
Visual Question Answering (VQA) is a comprehensive task to answer questions about the visual contents of an image. Recently, a number of studies have pointed out that VQA models tend to be misled by the dataset biases, and rely heavily on the superficial correlations between question and answer, rather than really understanding the visual contents. To address this issue, we propose visual calibration mechanism for VQA(VC-VQA) which extends the conventional VQA model with an additional image feature reconstruction module. The proposed model reconstructs image features based on predicted answer with question and measures the similarity between reconstructed image feature and original image feature, which will guide the VQA model predict the final answer. We evaluate our model on both VQA v1 and VQA v2 datasets, showing that VC-VQA effectively reduces impacts of dataset bias and achieves competitive performance compared to other mainstream methods.