Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:06:24
03 Oct 2022

in this paper, we present a \textit{Grad-Cam} aware supervised attention framework for visual question answering (VQA) tasks for post-disaster damage assessment purposes. Visual attention in visual question-answering tasks aims to focus on relevant image regions according to questions to predict answers. However, the conventional attention mechanisms in VQA work in an unsupervised manner, learning to give importance to visual contents by minimizing only task-specific loss. This approach fails to provide appropriate visual attention where the visual contents are very complex. The content and nature of UAV images in \textit{FlooNet-VQA} dataset are very complex as they depict the hazardous scenario after \textit{Hurricane Harvey} from a high altitude. To tackle this, we propose a supervised attention mechanism that uses explainable features from \textit{Grad-Cam} to supervise visual attention in the VQA pipeline. The mechanism we propose operates in two stages. in the first stage of learning, we derived the visual explanations through \textit{Grad-Cam} by training a baseline attention-based VQA model. in the second stage, we supervise our visual content for each question by incorporating the \textit{Grad-Cam} explanations from the previous phase of the training process. We have improved the model performance over the state-of-the-art VQA models by a considerable margin on \textit{FloodNet} dataset.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00