Semantically-informed Deep Neural Networks for sound recognition
Michele Esposito (Maastricht University); Giancarlo Valente (Maastricht University); Yenisel Plasencia-Calaña (Maastricht University); Michel Dumontier (Maastricht University); Bruno L. Giordano (CNRS); Elia Formisano (Maastricht University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Deep neural networks (DNNs) for sound recognition learn to categorize a barking sound as a "dog" and a meowing sound as a "cat" but do not exploit information inherent to the semantic relations between classes (e.g., both are animal vocalisations). Cognitive neuroscience research, however, suggests that human listeners automatically exploit higher-level semantic information on the sources besides acoustic information. Inspired by this notion, we introduce here a DNN that learns to recognize sounds and simultaneously learns the semantic relation between the sources (semDNN). Comparison of semDNN with a homologous network trained with categorical labels (catDNN) revealed that semDNN produces semantically more accurate labelling than catDNN in sound recognition tasks and that semDNN-embeddings preserve higher-level semantic relations between sound sources. Importantly, through a model-based analysis of human dissimilarity ratings of natural sounds, we show that semDNN approximates the behaviour of human listeners better than catDNN and several other DNN and NLP comparison models.