The Principal Component Analysis (PCA) is considered to be a quintessential data preprocessing tool in many machine learning applications. But the high dimensionality and massive scale of data in several of these applications means the traditional centralized PCA solutions are fast becoming irrelevant for them. Distributed PCA, in which a multitude of interconnected computing devices collaborate among themselves in order to obtain the principal components of the data, is a typical approach to overcome the limitations of the centralized PCA solutions. The focus in this talk is on the distributed PCA problem when the data are distributed among computing devices whose interconnections correspond to an ad-hoc topology. Such setup, which corresponds to the Internet-of-Things, vehicular networks, mobile edge computing, etc., has been considered in a few recent works on distributed PCA. But the resulting solutions either overlook the uncorrelated feature learning aspect of the PCA problem, tend to have high communications overhead that makes them unscalable and/or lack 'exact' or 'global' convergence guarantees. In order to overcome these limitations, this talk introduces two closely related variants of a new and scalable distributed PCA algorithm, termed FAST-PCA (Fast and exAct diSTributed PCA), that is efficient in terms of communications because of its one time-scale nature. The proposed FAST-PCA algorithm is theoretically shown to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features for machine learning, while extensive numerical experiments on both synthetic and real data highlight its superiority over existing distributed PCA algorithms.
Products Included in this Bundle