Overlap Local-Sgd: An Algorithmic Approach To Hide Communication Delays In Distributed Sgd

Jianyu Wang, Hao Liang, Gauri Joshi

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 24:04

04 May 2020

Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap Local-SGD (and its momentum variant) to overlap communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Overlap Local-Sgd: An Algorithmic Approach To Hide Communication Delays In Distributed Sgd

Jianyu Wang, Hao Liang, Gauri Joshi

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society