WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-Aware Weaving
Kyusung Seo (KAIST); Joonhyung Park (KAIST); Jaeyun Song (KAIST); Eunho Yang (KAIST)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
A cut-and-paste type of data augmentation strategy has attracted considerable attention in the vision community due to its simplicity and effectiveness in improving generalization performance. However, it is challenging for Automatic Speech Recognition (ASR) tasks to apply this type of augmentation since segments corresponding to specific output tokens (e.g. words or sub-words) have various lengths. Furthermore, if speech signals are indiscriminately mixed without considering semantics, the risk of generating nonsensical sentences arises. To address these issues, in this paper, we propose WeavSpeech, still a simple yet effective cut-and-paste augmentation method for ASR tasks that weaves a pair of speech data considering semantics. Our method can be applied to any language without requiring language-specific knowledge and seamlessly integrated with other verified augmentations. We validate the superiority of our method on representative ASR benchmark datasets, including LibriSpeech and WSJ.