Specaugment On Large Scale Datasets

Daniel Park, Yu Zhang, Chung-cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc Le, Yonghui Wu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:31

04 May 2020

Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets. In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Narayanan et al., 2018). We achieve improvement across all test domains by mixing raw training data augmented with SpecAugment and noise-perturbed training data when training the acoustic model. We also introduce a modification of SpecAugment that adapts the time mask size and/or multiplicity depending on the length of the utterance, which can potentially benefit large scale tasks. By using adaptive masking, we are able to further improve the performance of the Listen, Attend and Spell model on LibriSpeech to 2.2% WER on test-clean and 5.2% WER on test-other.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Specaugment On Large Scale Datasets

Daniel Park, Yu Zhang, Chung-cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc Le, Yonghui Wu

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society