Skip to main content

SARdBScene: Dataset and ResNet Baseline for Audio Scene Source Counting and Analysis

Michael Nigro (Toronto Metropolitan University); Sri Krishnan (Ryerson University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

This paper introduces a first of its kind dataset for audio scene analysis (ASA) and presents a baseline approach for audio source counting. SARdBScene is developed to promote research for audio source counting, as a relatively new ASA task, and present a comprehensive dataset that covers a variety of scenarios and audio-based tasks. It contains 80 hours of audio scene mixtures depicting four distinct environments with detailed annotations that make it a unique collection of curated data in the audio analysis landscape. Our baseline approach using ResNet establishes state-of-the-art results of 77.3% and 85.7% accuracy for audio source counting up to 12 sources and speaker counting up to 4 speakers, respectively.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00