SARdBScene: Dataset and ResNet Baseline for Audio Scene Source Counting and Analysis

Michael Nigro (Toronto Metropolitan University); Sri Krishnan (Ryerson University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

This paper introduces a first of its kind dataset for audio scene analysis (ASA) and presents a baseline approach for audio source counting. SARdBScene is developed to promote research for audio source counting, as a relatively new ASA task, and present a comprehensive dataset that covers a variety of scenarios and audio-based tasks. It contains 80 hours of audio scene mixtures depicting four distinct environments with detailed annotations that make it a unique collection of curated data in the audio analysis landscape. Our baseline approach using ResNet establishes state-of-the-art results of 77.3% and 85.7% accuracy for audio source counting up to 12 sources and speaker counting up to 4 speakers, respectively.

Tags:

Modeling, analysis and synthesis of acoustic environments

SARdBScene: Dataset and ResNet Baseline for Audio Scene Source Counting and Analysis

Michael Nigro (Toronto Metropolitan University); Sri Krishnan (Ryerson University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Neural Fourier Shift for Binaural Speech Rendering

Lightweight Annotation and Class Weight Training for Automatic Estimation of Alarm Audibility in Noise

Self-supervised learning of audio representations using angular contrastive loss

Join the IEEE Signal Processing Society