AN EXPERIMENTAL STUDY ON SOUND EVENT LOCALIZATION AND DETECTION UNDER REALISTIC TESTING CONDITIONS
Shutong Niu (University of Science and Technology of China ); Jun Du (University of Science and Technology of China); Qing Wang (University of Science and Technology of China); Li Chai (University of Science and Technologoy of China); Huaxin Wu (iFlytek Research); Zhaoxu Nian (University of Science and Technology of China); Lei Sun (University of Science and Technologoy of China); Yi Fang (iFlytek Research); Jia Pan (University of Science and Technology of China); Chin-Hui Lee (Georgia Institute of Technology)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We study four data augmentation (DA) techniques and two model architectures on realistic data for sound event localization and detection (SELD). First, based on ResNet-Conformer (RC), we compare the four DA approaches on the realistic DCASE 2022 SELD test set which is often not easy to handle due to room reverberations and audio overlaps in spontaneous recordings. Experimental results show that, except for audio channel swapping (ACS), the other three data augmentation methods that work well on the simulated SELD data set are no longer effective due to mismatches between simulated and realistic conditions. Next, using ACS-based augmentation, the two improved ResNet-Conformer networks further enhance SELD performances in realistic conditions. By incorporating these two sets of techniques, our overall system ranked the first place in SELD task of the DCASE 2022 Challenge.