Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement

Hansol Kim (GIST); Kyeognmuk Kang (GIST); Jong Won Shin (Gwangju Institute of Science and Technology)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

Traditionally, adaptive beamformers such as the minimum-variance distortionless response (MVDR) beamformer and generalized eigenvalue beamformer have been widely used for multi-channel speech enhancement with a single-channel postfilter. Recently, several approaches have been proposed to enhance the signals used to estimate speech and noise spatial covariance matrices (SCMs) and process the outputs of the beamformers using deep neural networks (DNNs). However, the preprocessing of the signals for SCMs estimation may disrupt phase relations among input signals and the time-averages used to estimate speech and noise SCMs may not be optimal for beamformer performance even though the estimated signals are close to the ground truth. In this letter, we propose a deep beamforming approach which estimates factors of the MVDR beamformer using a DNN to circumvent the difficulty of the speech and noise SCM estimation. We formulate the MVDR beamformer as a factorized form related to two complex factors and estimate them using a DNN with a cost function comparing beamformed signal and the original clean speech. Experimental results showed that the proposed factorized MVDR beamformer could mimic the characteristics of the MVDR beamformer with true relative transfer function and noise SCM and outperformed the MVDR beamformer with deep learning-based pre- and post-processing in terms of the perceptual evaluation of speech quality scores.

Tags:

Image, Video, and Multidimensional Signal Processing

Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement

Hansol Kim (GIST); Kyeognmuk Kang (GIST); Jong Won Shin (Gwangju Institute of Science and Technology)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Recallable Question Answering-based Re-ranking Considering Semantic Region for Cross-modal Retrieval

Self-Supervised Learning Based Anomaly Detection in Synthetic Aperture Radar Imaging

Selective Listening by Synchronizing Speech With Lips

Join the IEEE Signal Processing Society