Teacher-Student Learning For Low-Latency Online Speech Enhancement Using Wave-U-Net

Sotaro Nakaoka, Li Li, Shota Inoue, Shoji Makino

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:44

10 Jun 2021

This paper proposes a low-latency online extension of wave- U-net for single-channel speech enhancement, which utilizes teacher-student learning to reduce the system latency while keeping high enhancement performance. Wave-U-net is a recently proposed end-to-end source separation method, which achieved remarkable performance in singing voice separation and speech enhancement tasks. Since the enhancement is performed in the time domain, wave-U-net can efficiently model phase information and address the domain transformation limitation, where the time-frequency domain is normally adopted. Intending to apply wave-U-net to face-to-face applications such as hearing aids and in-car communication systems, where a strictly low-latency of less than 10 ms is required, in this paper, we investigate online versions of wave-U-net and propose using teacher-student learning to avoid the performance degradation caused by reducing input segmant length such that the system delay in a CPU is less than 10 ms. The experimental results revealed that the pro- posed model could perform in real-time and low-latency with a high performance of achieving a signal-to-distortion ratio improvement of about 8.35 dB.

Chairs:

Timo Gerkmann

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021