Skip to main content

Implicit Attention-based Cross-modal Collaborative Learning for Action Recognition

Jianghao Zhang, Xian Zhong, Wenxuan Liu, Kui Jiang, Zhengwei Yang, Zheng Wang

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Poster 10 Oct 2023

Human action recognition is an active research topic in recent years. Multiple modalities often convey heterogeneous but potentially complementary action information that single modality does not hold. Some efforts have been resoted to explore cross-modal representation to promote the modeling capability, but with limited improvement due to the simple fusion of different modalities. To this end, we propose an impliCit attention-based Cross-modal Collaborative Learning (C3L) for action recognition. Specifically, we apply a Modality Generalization network with Grayscale enhancement (MGG) to learn specific modality representation and interaction (infrared and RGB). Then, we construct a unified representation space through the Uniform Modality Representation module (UMR), which preserves the modality information while enhancing the overall representation ability. Finally, feature extractors adaptively leverage modality-specific knowledge to realize cross-modal collaborative learning. Extensive experiments conducted on three widely-used public benchmarks InfAR, HMDB51, and UCF101, demonstrate the effectiveness and strength of our proposed method.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00