Enhancing Privacy Through Domain Adaptive Noise Injection for Speech Emotion Recognition
Tiantian Feng, Hanieh Hashemi, Murali Annavaram, Shrikanth Narayanan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:11
Speech emotion recognition (SER) techniques have gained considerable interest in many applications including smart virtual assistants and health state tracking. SER systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive demographic traits, such as gender, age and language background. It is desirable to select only features that are necessary for the emotion classification while protecting sensitive features. However, there may be features that are necessary for emotion classification that may also reveal other demographic traits. In this work, we propose a method to improve inference privacy by injecting noise into the input speech data, but without degrading the SER system performance. The approach combines a noise representation learning architecture, called Cloak, with adversarial training to keep relevant information inside the data for emotion classification while removing information that would enable inferring sensitive demographic attributes. Experimental results show that our method can effectively prevent inference of sensitive demographic information, and that the improved privacy comes at a cost of only a minor utility loss for the emotion classification.