D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement

Shengkui Zhao (Alibaba Group); Bin Ma ("Alibaba, Singapore R&D Center")

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Monaural speech enhancement has been widely studied using real networks. However, the input and the target are naturally complex-valued in the TF domain, a fully complex network is highly desirable for effectively modelling the sequence in the complex domain. Moreover, phase has been proved learnable together with magnitude using complex masking or complex spectral mapping. Many recent studies focus only one of them, ignoring their performance boundaries. To address above issues, we propose a fully complex dual-path dual-decoder conformer network (D2Former). In D2Former, we form a dual-path complex TF self-attention architecture for effectively modelling the complex-valued TF sequence and boost the encoder and the decoders using a dual-path learning structure. In addition, we improve the performance boundaries of individual target by a joint-learning framework.

Tags:

Speech enhancement and separation

D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement

Shengkui Zhao (Alibaba Group); Bin Ma ("Alibaba, Singapore R&D Center")

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing Audio-Visual Speech Enhancement

Fast and Efficient Speech Enhancement with Variational Autoencoders

SINGLE-CHANNEL SPEECH ENHANCEMENT WITH DEEP COMPLEX U-NETWORKS AND PROBABILISTIC LATENT SPACE MODELS

Join the IEEE Signal Processing Society