Dnn-Supported Mask-Based Convolutional Beamforming For Simultaneous Denoising, Dereverberation, And Source Separation
Tomohiro Nakatani, Riki Takahashi, Tsubasa Ochiai, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Shoko Araki
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 15:01
In this article, we investigate an integrated mask-based convolutional beamforming method for performing simultaneous denoising, dereverberation, and source separation. Conventionally, it is dif?cult for neural network-supported mask-based source separation to perform denoising and dereverberation at the same time and for spatial clustering-based source separation to reliably solve the permutation problem in the presence of noise and reverberation. This greatly limits the application of mask-based source separation. To address this issue, we propose a method to integrate state-of-the-art techniques for mask-based beamforming into a single optimization framework. These techniques include frequency-domain Convolutional Neural Network based utterance-level Permutation Invariant Training with a large receptive ?eld (CNN-uPIT), noisy Complex Gaussian Mixture Model based spatial clustering (noisyCGMM), and Weighted Power minimization Distortionless response (WPD) convolutional beamforming. Our experiments show that all these components are essential for accurately estimating desired speech signals in noisy reverberant multisource environments.