Learning Disentangled Feature Representations For Speech Enhancement Via Adversarial Training

Nana Hou, Chenglin Xu, Eng Siong Chng, Haizhou Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:36

10 Jun 2021

Deep learning based speech enhancement degrades significantly in face of unseen noise. To address such mismatch, in this work, we propose to learn noise-agnostic feature representations by disentanglement learning, which removes the unspecified noise factor, while keeping the specified factors of variation associated with the clean speech. Specifically, a discriminator module is introduced to distinguish the type of noises, which is referred to as the disentangler. With the adversarial training strategy, a gradient reversal layer seeks to disentangle the noise factor and remove it from the feature representation. Experiment results show that the proposed approach achieves 5.8% and 5.2% relative improvements over the best baseline in terms of perceptual evaluation of the speech quality (PESQ) and segmental signal-to-noise ratio (SSNR), respectively. Furthermore, the ablation study indicates that the proposed disentangler module is also effective in the other encoder-decoder-like structure. The scripts are available at Github.

Chairs:

Timo Gerkmann

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021