ATTENUATION OF ACOUSTIC EARLY REFLECTIONS IN TELEVISION STUDIOS USING PRETRAINED SPEECH SYNTHESIS NEURAL NETWORK
Tomer Rosenbaum, Israel Cohen, Emil Winebrand
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:11
Machine learning and digital signal processing have been extensively used to enhance speech. However, methods to reduce early reflections in studio settings are usually related to the physical characteristics of the room. In this paper, we address the problem of acoustic early reflections in television studios and control rooms, and propose a two-stage method that exploits the knowledge of a pretrained speech synthesis generator. First, given a degraded speech signal that includes the direct sound and early reflections, a U-Net convolutional neural network is used to attenuate the early reflections in the spectral domain. Then, a pretrained speech synthesis generator reconstructs the phase in order to predict an enhanced speech signal in time domain. Qualitative and quantitative experimental results demonstrate excellent studio-quality of speech enhancement.