SELF-SUPERVISED ACCENT LEARNING FOR UNDER-RESOURCED ACCENTS USING NATIVE LANGUAGE DATA

Mehul Kumar (Samsung Research); Jiyeon Kim (Samsung Research); Dhananjaya Gowda (Samsung Electronics); Abhinav Garg (Stanford); Chanwoo Kim (Samsung Electronics)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In this paper, we propose a novel method to improve the accuracy of an English speech recognizer for a target accent using the corresponding native language data. Collecting labeled data for all accents of English to train an end-to-end neural speech recognizer for English is a difficult and expensive task. Also, finding a pool of representative English speakers for any arbitrary accent to collect unlabeled data can be a difficult task. However, collecting unlabeled speech data for any native language is a much simpler task. It is important to note that the accents of most non-native English speakers are heavily biased by the co-articulation of sounds in their own native language. In view of this, we propose to use unlabeled native language data to learn self-supervised representations during the pre-training stage. The pre-trained model is then fine-tuned using limited labeled English data for the target accent. Experiments using native language data to pre-train an English recognizer followed by fine-tuning using target accented English show significant improvements in word error rates on four different accents (Great Britain, Korean, Chinese, Spanish).

Tags:

Acoustic modeling for automatic speech recognition

SELF-SUPERVISED ACCENT LEARNING FOR UNDER-RESOURCED ACCENTS USING NATIVE LANGUAGE DATA

Mehul Kumar (Samsung Research); Jiyeon Kim (Samsung Research); Dhananjaya Gowda (Samsung Electronics); Abhinav Garg (Stanford); Chanwoo Kim (Samsung Electronics)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Lattice-free Sequence Discriminative Training for Phoneme-based Neural Transducers

DELAY-PENALIZED TRANSDUCER FOR LOW-LATENCY STREAMING ASR

A Reality Check and A Practical Baseline for Semantic Speech Embedding

Join the IEEE Signal Processing Society