Performance Study Of A Convolutional Time-Domain Audio Separation Network For Real-Time Speech Denoising
Christian Schüldt, Samuel Sonning, Hakan Erdogan, Scott Wisdom
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:56
Time-domain audio separation networks based on dilated temporal convolutions have recently been shown to perform very well compared to methods that are based on a time-frequency representation in speech separation tasks, even outperforming an oracle binary time-frequency mask of the speakers. This paper investigates the performance of such a time-domain network (Conv-TasNet) for speech denoising in a real-time setting, comparing various parameter settings. Most importantly, different amounts of lookahead are evaluated and compared to the baseline of a fully causal model. We show that a large part of the increase in performance between a causal and non-causal model is achieved with a lookahead of only $20~$ milliseconds, demonstrating the usefulness of even small lookaheads for many real-time applications.