SELF-SUPERVISED ACCENT LEARNING FOR UNDER-RESOURCED ACCENTS USING NATIVE LANGUAGE DATA
Mehul Kumar (Samsung Research); Jiyeon Kim (Samsung Research); Dhananjaya Gowda (Samsung Electronics); Abhinav Garg (Stanford); Chanwoo Kim (Samsung Electronics)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In this paper, we propose a novel method to improve the accuracy of an English speech recognizer for a target accent using the corresponding native language data. Collecting labeled data for all accents of English to train an end-to-end neural speech recognizer for English is a difficult and expensive task. Also, finding a pool of representative English speakers for any arbitrary accent to collect unlabeled data can be a difficult task. However, collecting unlabeled speech data for any native language is a much simpler task. It is important to note that the accents of most non-native English speakers are heavily biased by the co-articulation of sounds in their own native language. In view of this, we propose to use unlabeled native language data to learn self-supervised representations during the pre-training stage. The pre-trained model is then fine-tuned using limited labeled English data for the target accent. Experiments using native language data to pre-train an English recognizer followed by fine-tuning using target accented English show significant improvements in word error rates on four different accents (Great Britain, Korean, Chinese, Spanish).