GRADIENT STALENESS IN ASYNCHRONOUS OPTIMIZATION UNDER RANDOM COMMUNICATION DELAYS
Haider Al-Lawati, Stark Draper
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:44
Distributed optimization is widely used to solve large-scale optimization problems by parallelizing gradient-based algorithms across multiple computing nodes. In asynchronous optimization, the optimization parameter is updated using stale gradients, which are gradients calculated with respect to out-of-date parameters. Although large degrees of staleness can slow convergence, little is known about the impact of staleness and its relation to other system parameters. In this work, we study and analyze centralized asynchronous optimization. We show that the process of gradient arrival to the master node is similar in nature to a renewal process. We derive bounds on expected staleness and show its connection to other system parameters such as the number of workers, expected compute time and communication delays. Our derivations can be used in existing convergence analyses to express convergence rates in terms of other known system parameters. Such an expression gives further details on what factors impact convergence.