IMPROVING DUAL-MICROPHONE SPEECH ENHANCEMENT BY LEARNING CROSS-CHANNEL FEATURES WITH MULTI-HEAD ATTENTION

Xinmeng Xu, Rongzhi Gu, Yuexian Zou

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:41

09 May 2022

Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dualmicrophone speech enhancement (DMSE) systems. However, learning the mutual relationship between artificially designed spatial feature and spectral feature is hard in the end-to-end DMSE. In this work, a novel architecture for DMSE using multi-head cross-attention based convolutional recurrent network (CRN) is presented. The proposed model includes a channel-independent encoding architecture for spectral estimation and a strategy to extract cross channel features through an multi-head cross attention mechanism. In addition, the proposed approach specifically formulates the decoder with an extra SNR estimator to estimate frame-level SNR under a multi-task learning framework, which is expected to avoid speech distortion lead by end-to-end DMSE module. Finally, a spectral gain function is adopted to further suppress the unnatural residual noise. Experiment results demonstrated a superior performance of the proposed model against several state-of-the-art models.

Tags:

snr estimator

spatial cues extraction

channel-independent encoding

multi-head cross-attention

dual-microphone speech enhancement

IMPROVING DUAL-MICROPHONE SPEECH ENHANCEMENT BY LEARNING CROSS-CHANNEL FEATURES WITH MULTI-HEAD ATTENTION

Xinmeng Xu, Rongzhi Gu, Yuexian Zou

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

Sorry, no results were found

Join the IEEE Signal Processing Society