Wawenets: A No-Reference Convolutional Waveform-Based Approach To Estimating Narrowband And Wideband Speech Quality
Andrew Catellier, Stephen Voran
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:13
Building on prior work we have developed a no-reference (NR) waveform-based convolutional neural network (CNN) architecture that can accurately estimate speech quality or intelligibility of narrowband and wideband speech segments. These Wideband Audio Waveform Evaluation Networks, or WAWEnets, achieve very high per-speech-segment correlation (?_seg ? 0.92, RMSE ? 0.38) to established full-reference quality and intelligibility estimators (PESQ, POLQA, PEMO, STOI) based on over 17 hours of speech from 127 previously unseen talkers speaking in 13 different languages; just 10% of our total data. NR correlations at this level across this broad scope are unprecedented. This achievement was made possible by using FR estimates as training targets so that WAWEnets could learn implicit undistorted speech models and exploit them to produce accurate NR estimates.