SQAPP: No-Reference Speech Quality Assessment via Pairwise Preference

Pranay Manocha, Adam Finkelstein, Zeyu Jin

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:18

12 May 2022

Automatic speech quality assessment remains challenging, as we lack complete models of human auditory perception. Many existing full-reference models correlate well with human perception, but cannot be used in real-world scenarios where ground truth clean reference recordings are not available. On the other hand no-reference metrics typically suffer from several shortcomings, such as lack of robustness to unseen perturbations and reliance on (limited) labeled data for training. Moreover, noise or large variance among the labels makes it difficult to learn generalizable representations, especially for recordings with subtle differences. This paper proposes a learning framework for estimating the quality of a recording without any reference, and without any human judgments. The main component of this framework is a pairwise-quality-preference strategy that reduces label noise, thereby making learning more robust. From pairwise-preferences, we first learn a content invariant quality ordering; and then we re-target the model to predict quality on an absolute scale. We show that the resulting learned metric is well-calibrated with human judgments. Since it is a deep network, the metric is differentiable, making it suitable as a loss function for downstream tasks. For example, we show that adding this metric to an existing speech enhancement method yields significant improvement.

Tags:

perceptual metric

speech quality

audio quality

speech enhancement

no-reference metric

pairwise preference

SQAPP: No-Reference Speech Quality Assessment via Pairwise Preference

Pranay Manocha, Adam Finkelstein, Zeyu Jin

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

Audio Signal Enhancement: A Weakly Supervised Deep Learning Approach

Enhancing Speech Quality: Modern Techniques in Dereverberation

Join the IEEE Signal Processing Society