Batch Normalization damages Federated Learning on Non-IID data: Analysis and Remedy
Yanmeng Wang (The Chinese University of Hong Kong, Shenzhen); Qingjiang Shi (Tongji University); Tsung-Hui Chang ("The Chinese University of Hong Kong,")
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Batch normalization (BN) has been widely used for accelerating the training of deep neural networks. However, recent findings show that, in the federated learning (FL) scenarios, BN can damage the learning performance when the clients have non-i.i.d. data. While several FL schemes have been proposed to address this issue, they still suffer a significant performance loss compared to the centralized scheme. In addition, none of them have explained how the BN impacts the FL convergence analytically. In this paper, we present the first convergence analysis to show that the mismatched local and global statistical parameters due to non-i.i.d data cause gradient deviation and it leads the algorithm to converge to a biased solution with a slower rate. To remedy this, we further present a new FL algorithm, called FedTAN, based on an iterative layer-wise parameter aggregation procedure. Experiment results are presented to show the superiority of FedTAN.