THE NERCSLIP-USTC SYSTEM FOR THE L3DAS23 CHALLENGE TASK2: 3D SOUND EVENT LOCALIZATION AND DETECTION (SELD)
Haoyin Yan (University of Science and Technology of China); Haitao Xu ( University of Science and Technology of China); Jie Zhang (University of Science and Technology of China); Qing Wang (University of Science and Technology of China)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Sound event localization and detection (SELD) aims at identifying the temporal activities of a known set of sound event classes and estimating their locations. It remains challenging especially when there are overlapped acoustic events. In this work, a robust network architecture with data augmentation techniques is proposed to improve SELD performance, where ResNet and Conformer blocks are combined to model both local and global patterns. To address the data sparsity issue in SELD, SpecAugment, mixup and audio channel swapping (ACS) techniques are adopted. Our proposed system is evaluated in the Task2 of the L3DAS23 challenge and ranks the second place, achieving significant improvements over the baseline.