TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
Yunyang Zeng (Carnegie Mellon University); Joseph Konan (Carnegie Mellon University); Shuo Han (Carnegie Mellon University); Muqiao Yang (Carnegie Mellon University); David Bick (Carnegie Mellon University); Anurag Kumar (Facebook Research); Shinji Watanabe (Carnegie Mellon University); Bhiksha Raj (Carnegie Mellon University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Speech enhancement models have greatly progressed in recent years,
but still show limits in perceptual quality of their speech outputs. We
propose an objective for perceptual quality based on temporal acous-
tic parameters. These are fundamental speech features that play
an essential role in various applications, including speaker recog-
nition and paralinguistic analysis. We provide a differentiable es-
timator for four categories of low-level acoustic descriptors involv-
ing: frequency-related parameters, energy or amplitude-related pa-
rameters, spectral balance parameters, and temporal features. Un-
like prior work that looks at aggregated acoustic parameters or a
few categories of acoustic parameters, our temporal acoustic param-
eter (TAP) loss enables auxiliary optimization and improvement of
many fine-grain speech characteristics in enhancement workflows.
We show that adding TAPLoss as an auxiliary objective in speech
enhancement produces speech with improved perceptual quality and
intelligibility. We use data from the Deep Noise Suppression 2020
Challenge to demonstrate that both time-domain models and time-
frequency domain models can benefit from our method.