Speech enhancement with neural homomorphic synthesis

Wenbin Jiang, Zhijun Liu, Kai Yu, Fei Wen

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:11

09 May 2022

Most deep learning-based speech enhancement methods operate directly on time-frequency representations or learned features without making use of the model of speech production. This work proposes a new speech enhancement method based on neural homomorphic synthesis. The speech signal is firstly decomposed into excitation and vocal tract with complex cepstrum analysis. Then, two complex-valued neural networks are applied to estimate the target complex spectrum of the decomposed components. Finally, the time-domain speech signal is synthesized from the estimated excitation and vocal tract. Furthermore, we investigated numerous loss functions and found that the multi-resolution STFT loss, commonly used in the TTS vocoder, benefits speech enhancement. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art complex-valued neural network-based methods in terms of both PESQ and eSTOI.

Tags:

speech synthesis

speech enhancement

complex neural network

source fiter model

Speech enhancement with neural homomorphic synthesis

Wenbin Jiang, Zhijun Liu, Kai Yu, Fei Wen

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

Audio Signal Enhancement: A Weakly Supervised Deep Learning Approach

Diffusion Models for Speech Enhancement and Restoration

Join the IEEE Signal Processing Society