M22: RATE-DISTORTION INSPIRED GRADIENT COMPRESSION
Yangyi Liu (McMaster University); Sadaf Dr Salehkalaibar (McMaster university); stefano rini (nycu); Jun Chen (McMaster University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In federated learning (FL), the communication constraint between the remote learners and the Parameter Server (PS) is a crucial bottleneck. This paper proposes M22, a rate-distortion inspired approach to model update compression for distributed training of deep neural networks (DNNs). In particular, (i) we propose a family of distortion measures referred to as M-magnitude weighted L2 norm and (ii) we assume that gradient updates follow an i.i.d. distribution with two degrees of freedom - generalized normal or Weibull. To measure the gradient compression performance under a communication constraint, we define the per-bit accuracy as the optimal improvement in accuracy that a bit of communication brings to the centralized model over the training period. Using this performance measure, we systematically benchmark the choice of gradient distribution and distortion measure. We provide substantial insights on the role of these choices and argue that significant performance improvements can be attained using such a rate-distortion inspired compressor.