Parallelizing Adam Optimizer With Blockwise Model-Update Filtering

Kai Chen, Haisong Ding, Qiang Huo

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:07

04 May 2020

Recently Adam has become a popular stochastic optimization method in deep learning area. To parallelize Adam in a distributed system, synchronous stochastic gradient (SSG) technique is widely used, which is inefficient due to heavy communication cost. In this paper, we attempt to parallelize Adam with blockwise model-update filtering (BMUF) instead. BMUF synchronizes model-update periodically and introduces a block momentum to improve performance. We propose a novel way to modify the estimated moment buffers of Adam and figure out a simple yet effective trick for hyper-parameter setting under BMUF framework. Experimental results on large scale English optical character recognition (OCR) task and large vocabulary continuous speech recognition (LVCSR) task show that BMUF-Adam achieves almost a linear speedup without recognition accuracy degradation and outperforms SSG-based method in terms of speedup, scalability and recognition accuracy.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Parallelizing Adam Optimizer With Blockwise Model-Update Filtering

Kai Chen, Haisong Ding, Qiang Huo

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society