Improving Accented Speech Recognition with Multi-Domain Training

Lucas Maison (Laboratoire Informatique d'Avignon); Yannick Estève (LIA - Avignon University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

Thanks to the rise of self-supervised learning, automatic speech recognition (ASR) systems now achieve near human performance on a wide variety of datasets. However, they still lack generalization capability and are not robust to domain shifts like accent variations. In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models. By incorporating various accents in the training set, we obtain both in-domain and out-of-domain improvements. Our numerical experiments show that we can reduce error rates by up to 25% (relative) on African and Belgian accents compared to single-domain training while keeping a good performance on standard French.

Tags:

Resource constrained speech recognition

Improving Accented Speech Recognition with Multi-Domain Training

Lucas Maison (Laboratoire Informatique d'Avignon); Yannick Estève (LIA - Avignon University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Papez: Resource-efficient Speech Separation with Auditory Working Memory

Ensemble knowledge distillation of self-supervised speech models

Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR

Join the IEEE Signal Processing Society