Cascaded Time + Time-Frequency Unet For Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps

Arun Asokan Nair, Kazuhito Koishida

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:15:31

11 Jun 2021

Speech enhancement aims to improve speech quality by eliminating noise and distortions. While most speech enhancement methods address signal independent additive sources of noise, several degradations to speech signals are signal dependent and non-additive, like speech clipping, codec distortions, and gaps in speech. In this work, we first systematically study and achieve state of the art results on each of these three distortions individually. Next, we demonstrate a neural network pipeline that cascades a time domain convolutional neural network with a time-frequency domain convolutional neural network to address all three distortions jointly. We observe that such a cascade achieves good performance while having the added benefit of keeping the action of each neural network component interpretable.

Chairs:

Ann Spriet

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021