Mt-Gcn For Multi-Label Audio Tagging With Noisy Labels
Harsh Shrivastava, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:05
Multi-label audio tagging is the task of predicting the types of sounds occurring in an audio clip. Recently, large-scale audio datasets such as Google's AudioSet, have allowed researchers to use deep learning techniques for this task but this comes at the cost of label noise in the datasets. Audio datasets such as the AudioSet are usually built following a hierarchical structure known as ontology which captures the relationships between different sound events with domain knowledge. However, existing methods for audio tagging failed to utilize this domain knowledge about label relationships in their models, resulting in models being sensitive to label noise. We therefore present MT-GCN, a Multi-task Learning based Graph Convolutional Network that learns domain knowledge from ontology. The relationships between sound events in our proposed method are described by a graph. We propose two ontology-based graph construction methods, and conduct extensive experiments on the FSDKaggle2019 dataset. The experimental results show that our approach outperforms the baseline methods by a significant margin.