Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

Lei Shi, Shijie Geng, Peng Gao, Songxiang Liu, Kai Shuang, Sen Su, Chiori Hori

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:51

04 May 2020

Multi-modality fusion technologies have greatly improved the performance of neural network-based Video Description/Caption, Visual Question Answering (VQA) and Audio Visual Scene-aware Dialog (AVSD) over the recent years. Most previous approaches only explore the last layers of multiple layer feature fusion while omit-ting the importance of intermediate layers. To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Net-work (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously. In our proposed QBN, we use the holistic text features to guide the update of visual features. In the meantime, Hamilton quaternion products can efficiently perform information flow from higher layers to lower layers for both visual and text modalities. The evaluation results show our QBN improved the performance on VQA 2.0, furthermore surpassed the approach us-ing large scale BERT or visual BERT pre-trained models. Extensiveablation study has been carried out to examine the influence of each proposed module in this study.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

Lei Shi, Shijie Geng, Peng Gao, Songxiang Liu, Kai Shuang, Sen Su, Chiori Hori

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society