TOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGE

Yen-Ju Lu, Xuankai Chang, Zhong-Qiu Wang, Shinji Watanabe, Samuele Cornell, Wangyou Zhang, Chenda Li, Zhaoheng Ni

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:30

07 May 2022

This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones. The core of our approach combines Deep Neural Network (DNN) driven complex spectral mapping with linear beamformers such as the multi-frame multi-channel Wiener filter. Our proposed system has two DNNs and a linear beamformer in between. Both DNNs are trained to perform complex spectral mapping, using a combination of waveform and magnitude spectrum losses. The estimated signal from the first DNN is used to drive a linear beamformer, and the beamforming result, together with this enhanced signal, are used as extra inputs for the second DNN which refines the estimation. Then, from this new estimated signal, the linear beamformer and second DNN are run iteratively. The proposed method was ranked first in the challenge, achieving, on the evaluation set, a ranking metric of 0.984, versus 0.833 of the challenge baseline.

Tags:

multi-microphone complex spectral mapping

deep learning

multi-channel speech enhancement

beamforming

TOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGE

Yen-Ju Lu, Xuankai Chang, Zhong-Qiu Wang, Shinji Watanabe, Samuele Cornell, Wangyou Zhang, Chenda Li, Zhaoheng Ni

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Invertible Neural Networks and their Applications

Slides: Invertible Neural Networks and their Applications

Slides: Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

Join the IEEE Signal Processing Society