An Empirical Bayes Approach To Partially Labeled And Shuffled Data Sets
Alex Dytso, H. Vincent Poor
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 17:53
This work outlines a method for an application of empirical Bayes in the setting of semi-supervised learning. That is, we consider a scenario in which the training set is partially or entirely unlabeled. In addition to the missing labels, we also consider a scenario where the available training data might be shuffled (i.e., the features and labels are not matched). Specifically, we propose to train model-based empirical Bayes separately on the set of features and the set of labels and combine/mix the two models based on the proportion of unlabeled pairs. The method then can be used to recover the missing labels (i.e., create pseudo-labels) of the data set and, in addition, if the data is shuffled, recover the correct permutation of the data. The technique is evaluated for a multivariate Gaussian model and is shown to consistently outperform a maximum likelihood approach. Moreover, the procedure is shown to be a consistent estimator for a multivariate Gaussian model with an arbitrary (non-degenerate) covariance matrix.