Investigation Of Specaugment For Deep Speaker Embedding Learning

Shuai Wang, Johan Rohdin, Old?ich Plchot, LukÃ¡Å¡ Burget, Honza Cernocky, Kai Yu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 34:33

04 May 2020

SpecAugment is a newly proposed data augmentation method for speech recognition. By randomly masking bands in the log Mel spectogram this method leads to impressive performance improvements. In this paper, we investigate the usage of SpecAugment for speaker verification tasks. Two different models, namely 1-D convolutional TDNN and 2-D convolutional ResNet34, trained with either Softmax or AAM-Softmax loss, are used to analyze SpecAugment's effectiveness. Experiments are carried out on the Voxceleb and NIST SRE 2016 dataset. By applying SpecAugment to the original clean data in an on-the-fly manner without complex off-line data augmentation methods, we obtained 3.72% and 11.49% EER for NIST SRE 2016 Cantonese and Tagalog, respectively. For Voxceleb1 evaluation set, we obtained 1.47% EER.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Investigation Of Specaugment For Deep Speaker Embedding Learning

Shuai Wang, Johan Rohdin, Old?ich Plchot, LukÃ¡Å¡ Burget, Honza Cernocky, Kai Yu

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society