3D audio signal processing systems for speech enhancement and sound localization and detection
Jisheng Bai (School of Marine Science and Technology, Northwestern Polytechnical University); Siwei Huang (JLESS); Han Yin (JLESS); Mou Wang (Northwestern Polytechnical University); Yafei Jia (School of Marine Science and Technology, Northwestern Polytechnical University); Jianfeng Chen (School of Marine Science and Technology, Northwestern Polytechnical University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The L3DAS23 of ICASSP Signal Processing Grand Challenge encourages research on 3D audio signal processing, such as 3D speech enhancement (SE) and 3D sound localization and detection (SELD). In this paper, we propose a two-stage system based on DPRNN and UNet for the SE task and a Conformer-based system for the SELD task. The proposed SE and SELD systems are evaluated on the L3DAS23 bind test sets. Results show that the proposed methods achieve state-of-the-art performance for 3D SE and SELD.