IMPROVING FAIRNESS AND ROBUSTNESS IN END-TO-END SPEECH RECOGNITION THROUGH UNSUPERVISED CLUSTERING

Irina-Elena Veliche (Meta); Pascale Fung (Hong Kong University of Science and Technology)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

The challenge of fairness arises when Automatic Speech Recognition (ASR) systems do not perform equally well for all sub-groups of the population. In the past few years there have been many improvements in overall speech recognition quality, but without any particular focus on advancing Equality and Equity for all user groups for whom systems do not perform well. ASR fairness is therefore also a robustness issue. Meanwhile, data privacy also takes priority in production systems. In this paper, we present a privacy preserving approach to improve fairness and robustness of end-to-end ASR without using metadata, zip codes, or even speaker or utterance embeddings directly in training. We extract utterance level embeddings using a speaker ID model trained on a public dataset, which we then use in an unsupervised fashion to create acoustic clusters. We use cluster IDs instead of speaker utterance embeddings as extra features during model training, which shows improvements for all demographic groups and in particular for different accents.

Tags:

New algorithms and approaches for speech recognition

IMPROVING FAIRNESS AND ROBUSTNESS IN END-TO-END SPEECH RECOGNITION THROUGH UNSUPERVISED CLUSTERING

Irina-Elena Veliche (Meta); Pascale Fung (Hong Kong University of Science and Technology)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

PRACTICE OF THE CONFORMER ENHANCED AUDIO-VISUAL HUBERT ON MANDARIN AND ENGLISH

A Quantum Approach for Stochastic Constrained Binary Optimization

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Join the IEEE Signal Processing Society