D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
Shengkui Zhao (Alibaba Group); Bin Ma ("Alibaba, Singapore R&D Center")
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Monaural speech enhancement has been widely studied using real networks. However, the input and the target are naturally complex-valued in the TF domain, a fully complex network is highly desirable for effectively modelling the sequence in the complex domain. Moreover, phase has been proved learnable together with magnitude using complex masking or complex spectral mapping. Many recent studies focus only one of them, ignoring their performance boundaries. To address above issues, we propose a fully complex dual-path dual-decoder conformer network (D2Former). In D2Former, we form a dual-path complex TF self-attention architecture for effectively modelling the complex-valued TF sequence and boost the encoder and the decoders using a dual-path learning structure. In addition, we improve the performance boundaries of individual target by a joint-learning framework.