Attention Mixup: An Accurate Mixup Scheme based on Interpretable Attention Mechanism for Multi-label Audio Classification
Wuyang Liu (School of Cyber Science and Engineering, Wuhan University); Yanzhen Ren (Computer School of Wuhan University); Jingru Wang (School of Cyber Science and Engineering, Wuhan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Mixup proves to be an efficient data augmentation method on audio classification tasks. Original mixup scheme directly mixes the waveform of two random samples, which not only ignores the temporal distribution of the sound events, but may also interfere with the original sound events in another sample. This paper proposes Attention MixUp (AMU), which only selects those segments that contain sound events for mixup, rather than simply mixing the entire sample. AMU utilizes the attention maps of pretrained audio classification Vision Transformer (ViT) to filter out the patches on the spectrogram that are useful for classification, and then selects the regions for mixup according to three different strategies. Experimental results show a remarkable improvement (+1.9 mAP) on state-of-the-art Audioset classification methods with either CNN or ViT backbone. Further experiments show that AMU achieves the performance gain by improving the accuracy on short events (0.1s to 2s) by an average of 6.8% while keeping the accuracy on longer events.