WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-Aware Weaving

Kyusung Seo (KAIST); Joonhyung Park (KAIST); Jaeyun Song (KAIST); Eunho Yang (KAIST)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

A cut-and-paste type of data augmentation strategy has attracted considerable attention in the vision community due to its simplicity and effectiveness in improving generalization performance. However, it is challenging for Automatic Speech Recognition (ASR) tasks to apply this type of augmentation since segments corresponding to specific output tokens (e.g. words or sub-words) have various lengths. Furthermore, if speech signals are indiscriminately mixed without considering semantics, the risk of generating nonsensical sentences arises. To address these issues, in this paper, we propose WeavSpeech, still a simple yet effective cut-and-paste augmentation method for ASR tasks that weaves a pair of speech data considering semantics. Our method can be applied to any language without requiring language-specific knowledge and seamlessly integrated with other verified augmentations. We validate the superiority of our method on representative ASR benchmark datasets, including LibriSpeech and WSJ.

Tags:

Resource constrained speech recognition

WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-Aware Weaving

Kyusung Seo (KAIST); Joonhyung Park (KAIST); Jaeyun Song (KAIST); Eunho Yang (KAIST)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Papez: Resource-efficient Speech Separation with Auditory Working Memory

Improving Accented Speech Recognition with Multi-Domain Training

MoLE : MIXTURE OF LANGUAGE EXPERTS FOR MULTI-LINGUAL AUTOMATIC SPEECH RECOGNITION

Join the IEEE Signal Processing Society