A Neural Network For Monaural Intrusive Speech Intelligibility Prediction
Mathias Bach Pedersen, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 16:08
Monaural intrusive speech intelligibility prediction (SIP) methods aim to predict the speech intelligibility (SI) of a single-microphone noisy and/or processed speech signal using the underlying clean speech signal. In the present work, we propose a neural network for monaural intrusive SIP. The proposed network is trained on data from multiple listening tests to predict SI. In the interest of using the available listening test data as efficiently as possible and to facilitate SI prediction of short duration speech signals, training is based on a local-time intelligibility curve derived from the listening test data. The trained neural network is evaluated, in terms of rank order correlation, against the classical monaural intrusive predictors STOI and ESTOI. The network is found to perform the best overall with a Kendall's tau of 0.825 measured over long duration, i.e. speech signals up to several minutes in duration. For short-term prediction using short speech signals of 1 - 10 seconds the network also shows better performance and smaller prediction variance.