Skip to main content

MULTI-SEMANTIC ALIGNMENT CO-REASONING NETWORK FOR VIDEO QUESTION ANSWERING

Min Peng, Liangchen Liu, Zhenghao Li, Yu Shi, Xiangdong Zhou

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Lecture 09 Oct 2023

Video question answering challenges models on understanding textual questions with varying complexity and searching for clues from visual content with different hierarchical semantics. In this paper, we propose a novel Multi-Semantic Alignment Co-Reasoning Network (MACN) to accomplish an interactive inference between the question and the video input. The design of our MACN comprises two modules of Question-Centric Interaction (QCI) and Contextual Semantic Reasoning (CSR). Specifically, QCI establishes a question-centric heterogeneous graph model to align visual content at different temporal scales with questions to enable the extraction of visual representations under better textual understanding. CSR exploits self-attention mechanisms to extract the contextual dependencies of visual semantics at different hierarchies to achieve co-reasoning of answer clues. Experiments on three benchmarks demonstrate that our proposed method is superior to previous state-of-the-art performance.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00