Warp-Q: Quality Prediction For Generative Neural Speech Codecs

Wissam Jassim, Jan Skoglund, Michael Chinen, Andrew Hines

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:15:14

09 Jun 2021

Good speech quality has been achieved using waveform matching and parametric reconstruction coders. Recently developed very low bit rate generative codecs can reconstruct high quality wideband speech with bit streams less than 3 kb/s. These codecs use a DNN with parametric input to synthesise high quality speech outputs. Existing objective speech quality models (e.g., POLQA, VISQOL) do not accurately predict the quality of coded speech from these generative models underestimating quality due to signal differences not highlighted in subjective listening tests. We present WARP-Q, a full-reference objective speech quality metric that uses dynamic time warping cost for MFCC speech representations. It is robust to small perceptual signal changes. Evaluation using waveform matching, parametric and generative neural vocoder based codecs as well as channel and environmental noise shows that WARP-Q has better correlation and codec quality ranking for novel codecs compared to traditional metrics in addition to versatility for general quality assessment scenarios.

Chairs:

Ina Kodrasi

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Warp-Q: Quality Prediction For Generative Neural Speech Codecs

Wissam Jassim, Jan Skoglund, Michael Chinen, Andrew Hines

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Welcome and Opening Remarks for the IEEE SustainTech Leadership Forum

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Join the IEEE Signal Processing Society