EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION

Zhikai Zhou, Wei Wang, Wangyou Zhang, Yanmin Qian

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:57

12 May 2022

Automatic speech recognition (ASR) has suffered great performance degradation when facing low-resource languages with limited training data. In this work, we propose a series of training strategies to exploring more effective data utilization for low-resource speech recognition. In low-resource scenarios, multilingual pretraining is of great help for the above purpose. We exploit relationships among different languages for better pretraining. Then, the knowledge extracted from the language classifier is utilized for data weighing on training samples, making the model more biased towards the target low-resource language. Moreover, dynamic curriculum learning as a warm-up strategy and length perturbation as data augmentation are also designed. All these three methods form a newly improved training strategy for low-resource speech recognition. Meanwhile, we evaluate the proposed strategies using rich-resource languages for pretraining (PT) and finetuning (FT) the model on the target language with limited data. The experimental results show that on the CommonVoice dataset, compared with the commonly used multilingual PT+FT method, the proposed strategies achieve a relative 15-25% reduction in word error rate on different target languages, which shows the significant effects of the proposed data utilization strategy.

Tags:

multilingual speech recognition

curriculum learning

data utilization

data augmentation

low-resource speech recognition

EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION

Zhikai Zhou, Wei Wang, Wangyou Zhang, Yanmin Qian

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Slides: BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations

BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations

DEEP ACTIVE LEARNING BASED ON SALIENCY-GUIDED DATA AUGMENTATION FOR IMAGE CLASSIFICATION

Join the IEEE Signal Processing Society