Learning With Out-Of-Distribution Data For Audio Classification
Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:01
In supervised machine learning, the standard assumptions of data and label integrity are not always satisfied due to cost constraints or otherwise. In this paper, we investigate a case of this for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances: data that does not belong to any of the target classes. We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling. The amount of data required for this is shown to be small. Experiments are carried out on the FSDnoisy18k audio dataset, where OOD instances are very prevalent. The proposed method is shown to improve classification performance by a significant margin for convolutional neural networks. Comparisons with other techniques are similarly encouraging.