On Distributed Stochastic Gradient Descent For Nonconvex Functions In The Presence Of Byzantines
Saikiran Bulusu, Prashant Khanduri, Pranay Sharma, Pramod Varshney
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:33
We consider the distributed stochastic optimization problem of minimizing a nonconvex function $f$ in an adversarial setting. All the $w$ worker nodes in the network are expected to send their stochastic gradient vectors to the fusion center (or server). However, some (at most $\alpha$-fraction) of the nodes may be Byzantines, which may send arbitrary vectors instead. Vanilla implementation of distributed stochastic gradient descent (SGD) cannot handle such misbehavior from the nodes. We propose a robust variant of distributed SGD which is resilient to the presence of Byzantines. The fusion center employs a novel filtering rule that identifies and removes the Byzantine nodes. We show that $T = \tilde{O}\left( \frac{1}{w \epsilon^2} + \frac{\alpha^2}{\epsilon^2} \right)$ iterations are needed to achieve an $\epsilon$-approximate stationary point ($x$ such that $\|\nabla f(x)\|^2\leq \epsilon$) for the nonconvex learning problem. Unlike other existing approaches, the proposed algorithm is independent of the problem dimension.