Skip to main content

Two-Stream Hybrid Attention Network For Multimodal Classification

Qipin Chen, Zhenyu Shi, Zhen Zuo, Jinmiao Fu, Yi Sun

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:11:38
20 Sep 2021

On modern e-commerce platforms like Amazon, the number of products is fast growing, precise and efficient product classification becomes a key lever to great customer shopping experience. To tackle the large-scale product classification problem, a major challenge is how to leverage multimodal product information (e.g., image, text). One of the most successful directions is the attention-based deep multimodal learning, where there are mainly two types of frameworks: 1) keyless attention, which learns the importance of features within each modal; and 2) key-based attention, which learns the importance of features using other modalities. In this paper, we propose a novel Two-stream Hybrid Attention Network (HANet), which leverages both key-based and keyless attention mechanisms to capture the key information across product image and title modalities. We experimentally show that our HANet achieves state-of-the-art performance on Amazon-scale product classification problem.

Value-Added Bundle(s) Including this Product

More Like This