Truth-To-Estimate Ratio Mask: A Post-Processing Method For Speech Enhancement Direct At Low Signal-To-Noise Ratios
He Wang, Bohan Chen, Yue Wei, Richard H.Y. So
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:47
This study proposes a bi-directional recurrent neural network (Bi-RNN) post-processing method for speech enhancement (SE) at low signal-to noise ratios (SNR). Current speech enhancement solutions performed badly under low SNR situations. Loizou and Kim proposed a solution to reduce speech distortion errors in time-frequency (T-F) domain but it requires the knowledge of ground truth. As ground truth is unknown in real-life applications, the current study proposes to use a Bi-RNN to implement Loizou and Kimâs solution as a post-processing method for SE engines. Our solutions do not require prior knowledge of ground truth. The effectiveness of the proposed method is investigated with a spectral subtraction (SS) SE engine, a non-negative matrix factorization (NMF) SE engine, and a deep neural network ideal ratio mask (DNN-IRM) SE engine, under matched/mis-matched noise and different SNR conditions. Experimental results demonstrate that the proposed post-processing method effectively improved both perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) for all of these SE engines, especially at low SNR conditions.