Improving Robustness Of Deep Learning Based Monaural Speech Enhancement Against Processing Artifacts
Ke Tan, DeLiang Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:03
In voice telecommunication, the intelligibility and quality of speech signals can be severely degraded by background noise if the speaker at the transmitting end talks in a noisy environment. Therefore, a speech enhancement system is typically integrated into the transmitter device or the receiver device. Without the knowledge of whether the other end is equipped with a speech enhancer, the transmitter and receiver devices can both process a speech signal with their speech enhancers. In this study, we find that enhancing a speech signal twice can dramatically degrade the enhancement performance. This is because the downstream speech enhancer is sensitive to the processing artifacts introduced by the upstream enhancer. We analyze this problem and propose a new training scheme for the downstream deep learning based speech enhancement model. Our experimental results show that the proposed training strategy substantially elevate the robustness of speech enhancers against artifacts induced by another speech enhancer.