Adaptive Multi-Hierarchical signSGD for Communication-Efficient Distributed Optimization
Haibo Yang, Xin Zhang, Minghong Fang, Jia Liu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:01
In this work, we investigate a communication-efficient multi-hierarchical signSGD (MH-signSGD) algorithm with an adaptive learning rate. Under the symmetric assumption of the stochastic gradient distribution, we show that, without the need for learning rate tuning, our proposed MH-signSGD matches the state-of-art sublinear convergence rate \(O(1/\sqrt{K})\) in nonconvex settings, where $K$ is the number of iterations. Our adaptive learning strategy is based on stochastically approximating the learning rate found by greedily minimizing an error upper bound between two successive iterations. Moreover, by leveraging a normal approximation technique to characterize stochastic gradient sign error, we are able to sharpen the convergence analysis of MH-sighSGD with a fixed learning rate \(1/\sqrt{K}\) and establish a strong result in the large-system regime, which says that the MH-signSGD algorithm asymptotically converges to a stationary point at rate \(O(1/\sqrt{M})\), where $M$ is the number of workers. In comparison, most existing work on signSGD can only prove a weaker finite neighborhood convergence in the large system regime. We validate our theoretical results experimentally both on synthetic data and real-world datasets.