Privacy-Preserving Occupancy Estimation
Jennifer Williams (University of Southampton); Vahid Yazdanpanah (University of Southampton); Sebastian Stein (University of Southampton)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In this paper, we introduce an audio-based framework for occupancy estimation, including a new public dataset, and evaluate occupancy in a 'cocktail party' scenario where the party is simulated by mixing audio to produce speech with overlapping talkers (1-10 people). To estimate the number of speakers in an audio clip, we explored five different types of speech signal features and trained several versions of our model using convolutional neural networks (CNNs). Further, we adapted the framework to be privacy-preserving by making random perturbations of audio frames in order to conceal speech content and speaker identity. We show that some of our privacy-preserving features perform better at occupancy estimation than original waveforms. We analyse privacy further using two adversarial tasks: speaker recognition and speech recognition. Our privacy-preserving models can estimate the number of speakers in the simulated cocktail party clips within 1-2 persons based on a mean-square error (MSE) of 0.9-1.6 and we achieve up to 34.9% classification accuracy while preserving speech content privacy. However, it is still possible for an attacker to identify individual speakers, which motivates further work in this area.