A Gaussian Latent Variable Model for Incomplete Mixed Type Data
Marzieh Ajirak (Stony Brook University); Petar Djuric ()
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In many machine learning problems, one has to work with data of different types, including continuous, discrete, and categorical data. Further, it is often the case that many of these data are missing from the database. This paper proposes a Gaussian process framework that efficiently captures the information from mixed numerical and categorical data that effectively incorporates missing variables. First, we propose a generative model for the mixed-type data. The generative model exploits Gaussian processes with kernels constructed from the latent vectors. We also propose a method for inference of the unknowns, and in its implementation, we rely on a sparse spectrum approximation of the Gaussian processes and variational inference. We demonstrate the performance of the method for both supervised and unsupervised tasks. First, we investigate the imputation of missing variables in an unsupervised setting, and then we show the results of joint imputation and classification on IBM employee data.