Normal-to-Lombard Speech Conversion by LSTM Network and BGMM for Intelligibility Enhancement of Telephone Speech
Gang Li, Xiaochen Wang, Ruimin Hu, Huyin Zhang, Shanfa Ke
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:03
Noise in the environment significantly decreases the speech intelligibility of telephone conversations. Despite clean speech output from the device, the listener is still hard to get information. This study focuses on intelligibility enhancement (IENH) of telephone speech in near-end background noise based on normal-to-Lombard speech conversion. The proposed approach uses long short-term memory (LSTM) and Bayesian Gaussian mixture model (BGMM) to build the speech mapping model. Compared with previous studies, we fully consider the short-term correlations of speech and implement feature mappings with higher dimensional features and more types of features. Evaluations indicate that the proposed approach has achieved better results in both objective and subjective evaluation.