SARdBScene: Dataset and ResNet Baseline for Audio Scene Source Counting and Analysis
Michael Nigro (Toronto Metropolitan University); Sri Krishnan (Ryerson University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper introduces a first of its kind dataset for audio scene analysis (ASA) and presents a baseline approach for audio source counting. SARdBScene is developed to promote research for audio source counting, as a relatively new ASA task, and present a comprehensive dataset that covers a variety of scenarios and audio-based tasks. It contains 80 hours of audio scene mixtures depicting four distinct environments with detailed annotations that make it a unique collection of curated data in the audio analysis landscape. Our baseline approach using ResNet establishes state-of-the-art results of 77.3% and 85.7% accuracy for audio source counting up to 12 sources and speaker counting up to 4 speakers, respectively.