Nested Attention Network with Graph Filtering for Visual Question and Answering
Jing Lu (China University of Petroleum (East China)); Chunlei Wu (China University Of Petroleum(East China)); Leiquan Wang (UPC); Shaozu Yuan (UPC); Jie Wu (China University Of Petroleum)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Recently, Visual Question Answering(VQA), which is required to generate the answer by understanding both visual and textual content, has attracted considerable research interest. Most existing works extract visual features with the CNN network and learn its feature embedding with an attention mechanism. However, this mechanism may ignore the interaction between entities in the image, which has a fuzzy impact on the answer generation. To better explore the relationship between different entities in the image, a novel Nested Attention Network with Graph Filtering (NANGF) is proposed. It composes of two novel designed modules: a graph filtering mechanism to mine more precise visual semantics and avoid understanding deviation and nested attention to effectively guide the integration of visual features and question features. Extensive experiments conducted on the VQA2.0 datasets demonstrate the effectiveness of the proposed method.