Skip to main content

Two-Pathway Transformer Network For Video Action Recognition

Bo Jiang, Jiahong Yu, Lei Zhou, Kailin Wu, Yang Yang

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:06:19
20 Sep 2021

Traditional two-stream neural networks have shown that both appearance and motion information are important for video action recognition. However, their way of naively averaging two streams' scores at the end of the framework neglects the underlying relationship between these two kinds of information. In this paper, we propose a two-pathway transformer network that uses memory-based attention to explore such relationship, which further improves the classification performance. Specifically, a transformer-based decoder takes one pathway's features as the query while the other's as the key and value. Then based on the similarity matrix estimated by the query and key, relevant information from the value can be selected to enhance the query for the final classification task. Experiments demonstrate that our proposed method outperforms existing fusion strategies at the end of the two-stream methods.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00