Visual Graph Reasoning Network

Dingbang Li (ECNU); Xin Lin (ECNU); Haibin Cai (East China Normal University); Wenzhou Chen (Zhejiang University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Visual question answering (VQA) is a fundamental and challenging cross-modal task. This task requires the model to fully understand the image's content and reason out the answer based on the question. Existing VQA models understand visual content mainly based on bottom-up or grid features. However, both types of vision features have some drawbacks. The discreteness and independence of bottom-up features prevent models from adequately performing relational reasoning. Image segmentation by grid features leads to the fragmentation of meaningful visual regions, limiting the cross-modal alignment capability of the model. Therefore, we proposed a more flexible method called Visual Graph. It can connect different patches according to semantic similarity and spatial relevance to model the potential relationships and cluster the adjacent homologous patches. Based on the Visual Graph, we designed a Visual Graph Reasoning Network for VQA. We evaluated our model on GQA and VQA-v2. The experimental results show that our models can achieve excellent performance between single models.

Tags:

Multimedia perception and processing for autonomous systems

Visual Graph Reasoning Network

Dingbang Li (ECNU); Xin Lin (ECNU); Haibin Cai (East China Normal University); Wenzhou Chen (Zhejiang University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

A novel efficient multi-view traffic-related object detection framework

Unsupervised Video Anomaly Detection for Stereotypical Behaviours in Autism

Shuffled Autoregression For Motion Interpolation

Join the IEEE Signal Processing Society