Online Model Compression for Federated Learning with Large Models
Tien-Ju Yang (Google); Yonghui Xiao (Google); Giovanni Motta (Google, Inc.); Françoise Beaufays (Google); Rajiv Mathews (Google); Mingqing Chen (Google Inc.)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper addresses the challenges of training large neural networks under federated learning settings: high on-device memory usage and communication cost. The proposed Online Model Compression (OMC) provides a framework that stores model parameters in a compressed format and decompresses them only when needed. We use quantization as the compression method in this paper and propose three methods, (1) per-variable transformation, (2) weight-matrix-only quantization, and (3) partial variable quantization, to minimize its impact on model accuracy. Our experiments on two recent neural networks for speech recognition and two different datasets show that OMC can reduce memory usage and communication cost of model parameters by up to 59% while attaining comparable accuracy and training speed when compared with full-precision federated learning.