Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices
Mingbin Xu (Apple); Congzheng Song (Apple); Ye Tian (Apple); Neha Agrawal (Apple); Filip Granqvist (Apple); Rogier C van Dalen (Samsung AI Center, Cambridge, UK); Xiao Zhang (Apple); Arturo Argueta (Apple); Shiyi Han (Apple); Yaqiao Deng (Apple); Leo Liu (Apple); Anmol Walia (Apple); Alex Jin (Apple)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Federated Learning (FL) is a technique to train models on distributed edge devices with local data samples.
Differential Privacy (DP) can be applied with FL to provide a formal privacy guarantee for the sensitive data on device. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, which often prevents convergence.
We propose Partial Embedding Updates (PEU), a novel technique to reduce the impact of DP-noise by decreasing payload size. Furthermore, we adopt Low Rank Adaptation (LoRA) and Noise Contrastive Estimation (NCE) to reduce the memory demands of large models on compute-constrained devices. We demonstrate in simulation and with real devices that this combination of techniques makes it possible to train large-vocabulary language models while preserving accuracy and privacy.