Aiding speech harmonic recovery in DNN-based single channel noise reduction using cepstral excitation manipulation (CEM) components

Yanjue Song (Ghent University - imec); Nilesh Madhu (IDLab, Ghent University - imec)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

Weak harmonics of voiced speech segments are often lost during the process of noise suppression – especially at low SNRs. This leads to a distortion in the harmonic structure, and an accompanying loss in quality. In this paper, inspired by previous work on speech harmonic enhancement using statistical methods, we present a loss function component we term cepstral excitation manipulation (CEM) loss. This component is constructed based on the fundamental frequency-related cepstral coefficients. This component can be introduced in the training of state-of-the-art architectures and its benefit is benchmarked, here, on CRUSE. Experiments show that the proposed loss function component nicely supplements standard loss functions and the harmonic structure is better preserved. On average, the best system improves by 0.4 on PESQ and 0.47 on DNSMOS compared to the noisy input. Substantial improvements are primarily in low SNRs (-5 dB to 5 dB) – the range for which harmonic recovery is most required.

Tags:

Audio and speech source separation