HIERARCHICAL TRANSFORMER FOR MULTI-LABEL TRAILER GENRE CLASSIFICATION
Zihui Cai (School of Cyber Science and Engineering, Wuhan University); Hongwei Ding (School of Cyber Science and Engineering, Wuhan University); Xuemeng Wu (School of Cyber Science and Engineering, Wuhan University); Mohan Xu (School of Cyber Science and Engineering, Wuhan University); Xiaohui Cui (School of Cyber Science and Engineering, Wuhan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Determining the genres of a trailer is a challenging multi-label classification task. Previous studies tend to classify by CNN or RNN. Recently, Transformer based on attention mechanism has achieved better results in many research fields than CNN and RNN. Inspired by these, we propose a Hierarchical Transformer (HT). HT can process both the frame sequence (HT-F) and audio (HT-A) of trailers. Besides, a feature compression module is inserted into HT-F, and audio spectrogram segment is processed by HT-A as a whole, which can effectively reduce the data processed by the second Transformer. In order to reduce the training cost and improve the performance, we load the pre-trained weights from other related fields into some parameters of HT, and utilize the limited resources to train the remaining parameters. Experiments show that our best model outperforms state-of-the-art methods on several comprehensive metrics.