Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks

Ahmet E. Bulut, Kazuhito Koishida

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 15:24

04 May 2020

Single-channel speech enhancement (SE) can be described, in its simplest terms, as learning a transformation from single-channel noisy speech to the clean speech. To do this, we propose a simple but effective U-Net convolutional neural network (CNN) based architecture with skip-connections with a focus on real-time applications which require low-latency processing. To that end, we choose to process relatively small temporal windows and apply time-frequency (T-F) featurization on it to achieve magnitude estimation. Two state-of-the-art systems are picked for bench-marking: One operating on spectral-domain [1] and the other on temporal-domain [2]. We evaluate the performance of the systems in terms of perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI). Experimental results show that in terms of PESQ measure the proposed method provides around 27% and 11% relative improvement over the baseline systems respectively and has significantly lower latency compared to them. We further investigate the trade-off between performance and overall latency of the proposed system.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks

Ahmet E. Bulut, Kazuhito Koishida

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society