TORCHAUDIO-SQUIM: REFERENCE-LESS SPEECH QUALITY AND INTELLIGIBILITY MEASURES IN TORCHAUDIO

Anurag Kumar (Facebook Reality Labs); Ke Tan (Meta Platforms, Inc.); Zhaoheng Ni (Meta); Pranay Manocha (Princeton University); Xiaohui Zhang (Meta); Ethan Henderson (Meta Reality Labs Research); Buye Xu (Meta Reality Labs Research )

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Measuring quality and intelligibility of a speech signal is usually a critical step in development of speech processing systems. To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed. Through this paper, we introduce tools and a set of models to estimate such known metrics using deep neural networks. These models are made available in the well-established TorchAudio library, the core audio and speech processing library within the PyTorch deep learning framework. We refer to it as TorchAudio-Squim, TorchAudio-Speech QUality and Intelligibility Measures. More specifically, in the current version of TorchAudio-squim, we establish and release models for estimating PESQ, STOI and SI-SDR among objective metrics and MOS among subjective metrics. We develop a novel approach for objective metric estimation and use a recently developed approach for subjective metric estimation. These models operate in a “reference-less” manner, that is they do not require the corresponding clean speech as reference for speech assessment. Given the unavailability of clean speech and the effortful process of subjective evaluation in real-world situations, such easy-to-use tools would greatly benefit speech processing research and development.

Tags:

Audio and speech quality and intelligibility measures

TORCHAUDIO-SQUIM: REFERENCE-LESS SPEECH QUALITY AND INTELLIGIBILITY MEASURES IN TORCHAUDIO

Anurag Kumar (Facebook Reality Labs); Ke Tan (Meta Platforms, Inc.); Zhaoheng Ni (Meta); Pranay Manocha (Princeton University); Xiaohui Zhang (Meta); Ethan Henderson (Meta Reality Labs Research); Buye Xu (Meta Reality Labs Research )

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SpeechLMScore: Evaluating speech generation using speech language model

EFFICIENT INTELLIGIBILITY EVALUATION USING KEYWORD SPOTTING: A STUDY ON AUDIO-VISUAL SPEECH ENHANCEMENT

On the robustness of non-intrusive speech quality model by adversarial examples

Join the IEEE Signal Processing Society