Stargan For Emotional Speech Conversion: Validated By Data Augmentation Of End-To-End Emotion Recognition
Georgios Rizos, Alice Baird, Max Elliott, Björn Schuller
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:37
In this paper, we propose an adversarial network implementation for speech emotion conversion as a data augmentation method, validated by a multi-class speech affect recognition task. In our setting, we do not assume the availability of parallel data, and we additionally make it a priority to exploit as much as possible the available training data by adopting a cycle-consistent, class-conditional generative adversarial network with an auxiliary domain classifier. Our generated samples are valuable for data augmentation, achieving a corresponding 2% and 6% absolute increase in Micro- and Macro-F1 compared to the baseline in a 3-class classification paradigm using a deep, end-to-end network. We finally perform a human perception evaluation of the samples, through which we conclude that our samples are indicative of their target emotion, albeit showing a tendency for confusion in cases where the emotional attribute of valence and arousal are inconsistent.