Fast Training Of Deep Neural Networks For Speech Recognition

Guojing Cong, Brian Kingsbury, Tianyi Liu, Chih-Chieh Yang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:26

04 May 2020

Training large, deep neural network acoustic models for speech recognition on large datasets takes a long time on a single GPU, motivating research on parallel training algorithms. We present an approach for training a bidirectional LSTM acoustic model on the 2000-hour Switchboard corpus. The model we train achieves state-of-the-art word error rate, 7.5\% on the Hub5-2000 Switchboard test set and 13.1\% on the Callhome test set, and scales to an unprecedented 96 learners while employing only 12 global reductions per epoch of training. As our implementation incurs far fewer reductions than prior work, it does not require aggressively optimized communication primitives to reach state-of-the-art performance in a short amount of time. Using 48 NVIDIA V100 GPUs takes 5 hours; with 96 GPUs, training takes around 3 hours.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Fast Training Of Deep Neural Networks For Speech Recognition

Guojing Cong, Brian Kingsbury, Tianyi Liu, Chih-Chieh Yang

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society