Skip to main content

Optimal Noise-Aware Imaging With Switchable Prefilters

Zilai Gong, Masayuki Tanaka, Yusuke Monno, Masatoshi Okutomi

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:11:45
04 Oct 2022

Visual question answering (VQA) answers text-based questions about images. The difficulty of VQA lies in the accurate localization of the region related to the question. in this paper, we introduce the \emph{ques-to-visual (q2v)} feature as the additional input of VQA to tackle this problem. The \emph{q2v} feature is generated according to the semantics of the question, containing visual semantics that is helpful to locate the region related to the question. We then use self-attention to model the intra-relationship in each modality to enhance different features, i.e., \emph{q2v}, image, and text features. The enhanced features are then fused by spatial guided-attention and multi-scale channel attention modules for the answer prediction. Experimental results on the VQA2.0 benchmark dataset show that our method achieves higher performance when compared with other methods.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00