Preservation Of Anomalous Subgroups On Variational Autoencoder Transformed Data
Samuel C. Maina, Robert-Florian Samoilescu, Komminist Weldemariam, Reginald E. Bryant, Kush R. Varshney, William Ogallo, Skyler Speakman, Celia Cintas, Aisha Walcott-Bryant
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:50
We investigate the effect of variational autoencoder (VAE) based anonymization on anomalous subgroup preservation. In particular, we train a binary classifier to discover the most anomalous subgroup in a dataset by maximizing the bias between the groupâs predicted odds ratio from the model and the observed odds ratio from the data. We then perform anonymization using a VAE to synthesize an entirely new dataset that would ideally be drawn from the distribution of the original data. Finally, we repeat the anomalous subgroup discovery task on the new data and compare it to what was identified pre-anonymization. We evaluated our approach using two publicly available datasets from the financial industry. Our evaluation confirmed that the approach was able to produce synthetic datasets that preserved a high level of subgroup differentiation as identified initially in the original dataset. Such a distinction was maintained while having distinctly different records between the synthetic and original datasets.