A MULTI-MODAL TRANSFORMER APPROACH FOR FOOTBALL EVENT CLASSIFICATION

Yixiao Zhang, Baihua Li, Hui Fang, Qinggang Meng

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 11 Oct 2023

Video understanding has been enhanced by the use of multi-modal networks. However, recent multi-modal video analysis models have limited applicability to sports videos due to their specialised nature. This paper proposes a novel attention-based multi-modal neural network for sports event classification featuring a multi-stage fusion training strategy. The proposed multi-modal neural network integrates three modalities, including an image sequence modality, an audio modality and a newly proposed sports formation modality, to improve the sports video classification performance. Empirical results show that the proposed model outperforms the state-of-the-art transformer-based video method by 4.43% on top-1 accuracy on Soccernet-V2 dataset.

Tags:

Multi-modal video

sports events classification

video analysis

transformer

A MULTI-MODAL TRANSFORMER APPROACH FOR FOOTBALL EVENT CLASSIFICATION

Yixiao Zhang, Baihua Li, Hui Fang, Qinggang Meng

More Like This

Slides: Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

OPTIMIZING TRANSFORMER FOR LARGE-HOLE IMAGE INPAINTING

Join the IEEE Signal Processing Society