Generating And Protecting Against Adversarial Attacks For Deep Speech-Based Emotion Recognition Models
Zhao Ren, Alice Baird, Jing Han, Zixing Zhang, Björn Schuller
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:30
The development of deep learning models for speech emotion recognition has become a popular area of research. Adversarially generated data can cause false predictions, and in an endeavor to ensure model robustness, defense methods against such attacks should be addressed. With this in mind, in this study, we aim to train deep models to defending against non-targeted white-box adversarial attacks. Adversarial data is first generated from the real data using the fast gradient sign method. Then in the research field of speech emotion recognition, adversarial-based training is employed as a method for protecting against adversarial attack. We then train deep convolutional models with both real and adversarial data, and compare the performances of two adversarial training procedures - namely, vanilla adversarial training, and similarity-based adversarial training. In our experiments, through the use of adversarial data augmentation, both of the considered adversarial training procedures can improve the performance when validated on the real data. Additionally, the similarity-based adversarial training learns a more robust model when working with adversarial data. Finally, the considered VGG-16 model performs the best across all models, for both real and generated data.