An Affinity-driven Relation Network for Figure Question Answering
Jialong Zou, Guoli Wu, Taofeng Xue, Qingfeng Wu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 06:59
Figure question answering (FQA) is a new multimodal task for visual question answering (VQA). Given a scientific-style figure and a related question, the machine must answer the question through reasoning. The Relation Networks (RN), the earliest proposed approach for FQA, computes the representation of relations between objects within images to infer the answers. However, RN generates numerous relation features, which makes the reasoning process more complicated and restricts the performance. To better solve this problem, we introduce a novel framework, which consists of a deconvolutional network, an LSTM network and an affinity-driven relation network. Specifically, the deconvolutional network enhances the feature fusion by combining low-level and high-level features of images. The affinity-driven relation network efficiently represents the intra-relation within images and the inter-relation between images and questions, and makes the reasoning process more effective. The experimental results show that our approach outperforms most state-of-the-art methods in FQA tasks.