Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models

Ali Raza Syed (The Graduate Center, CUNY); Michael I Mandel (Brooklyn College, CUNY)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

Data Valuation in machine learning is concerned with quantifying the relative contribution of a training example to a model's performance. Quantifying the importance of training examples is useful for identifying high and low quality data to curate training datasets and for address data quality issues. Shapley values have gained traction in machine learning for curating training data and identifying data quality issues. While computing the Shapley values of training examples is computationally prohibitive, approximation methods have been used successfully for classification models in computer vision tasks. We investigate data valuation for Automatic Speech Recognition models which perform a structured prediction task and propose a method for estimating Shapley values for these models. We show that a proxy model can be learned for the acoustic model component of an end-to-end ASR and used to estimate Shapley values for acoustic frames. We present a method for using the proxy acoustic model to estimate Shapley values for variable length utterances and demonstrate that the Shapley values provide a signal of example quality.

Tags:

Machine learning methods for language

Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models

Ali Raza Syed (The Graduate Center, CUNY); Michael I Mandel (Brooklyn College, CUNY)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SELF SUPERVISED BERT FOR LEGAL TEXT CLASSIFICATION

A Sentiment and Syntactic-Aware Graph Convolutional Network for Aspect-level Sentiment Classification

Egocentric Action Anticipation for Personal Health

Join the IEEE Signal Processing Society