ADAPTIVE STEP-SIZE METHODS FOR COMPRESSED SGD
Adarsh Muthuveeru-Subramaniam (University of Illinois at Urbana-Champaign); Akshayaa Magesh (University of Illinois at Urbana-Champaign); Venugopal V. Veeravalli (University of Illinois at Urbana Champaign)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Compressed Stochastic Gradient Descent (SGD) algorithms have been proposed to address the communication bottleneck in distributed and decentralized optimization problems such as federated machine learning. Many existing compressed SGD algorithms use non-adaptive step-sizes (constant or di-
minishing) to provide theoretical convergence guarantees. Since non-adaptive step-sizes typically involve unknown system parameters, the step-sizes are fine-tuned in practice to obtain good empirical performance for the given dataset and
learning model. Such fine-tuning might be impractical in many scenarios. Thus, it is of interest to study compressed SGD using adaptive step-sizes. Motivated by prior work that use adaptive step-sizes for uncompressed SGD, we develop an Armijo rule based step-size selection method for compressed SGD. In particular, we introduce a scaling technique for the descent step, which we use to establish order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition, and for non-convex objectives under a strong growth condition. We present experimental results on deep neural networks trained on real-world datasets, and compare the performance of our proposed algorithm with state-of-the-art compressed SGD methods to demonstrate improved performance at various levels of compression.