RAT: Radial Attention Transformer for Singing Technique Recognition
Guan-Yuan Chen (National Tsing Hua University); Ya-Fen Yeh (National Tsing Hua University); Von-Wun Soo (nthu)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Singing techniques are important skills for a professional vocal performance that usually involves dedicated fluctuations of timbre, pitch, duration, and loudness, etc. To recognize types of singing techniques can be quite challenging because 1) the time-frequency features in singing are highly dynamic that may appear in a long range of audio signals; 2) different singing techniques such as vibrato and trill tend to have similar features in the locality; 3) The distribution of singing technique dataset suffers from the long-tailed issue. To manage these problems, we proposed a novel Radial Attention Transformer (RAT) with a Radial Attention (RA) Module that can capture the fine-grained local features as well as the long range inter-dependency of audio features. The experiment results showed that the proposed method, RAT with Adaptive Logit Adjustment (ALA) Loss significantly outperformed previous state-of-the-art models (Convolutional Neural Networks and Deformable CNN), on the recognition tasks of singing technique categories.