Stargan For Emotional Speech Conversion: Validated By Data Augmentation Of End-To-End Emotion Recognition

Georgios Rizos, Alice Baird, Max Elliott, BjÃ¶rn Schuller

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:37

04 May 2020

In this paper, we propose an adversarial network implementation for speech emotion conversion as a data augmentation method, validated by a multi-class speech affect recognition task. In our setting, we do not assume the availability of parallel data, and we additionally make it a priority to exploit as much as possible the available training data by adopting a cycle-consistent, class-conditional generative adversarial network with an auxiliary domain classifier. Our generated samples are valuable for data augmentation, achieving a corresponding 2% and 6% absolute increase in Micro- and Macro-F1 compared to the baseline in a 3-class classification paradigm using a deep, end-to-end network. We finally perform a human perception evaluation of the samples, through which we conclude that our samples are indicative of their target emotion, albeit showing a tendency for confusion in cases where the emotional attribute of valence and arousal are inconsistent.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Stargan For Emotional Speech Conversion: Validated By Data Augmentation Of End-To-End Emotion Recognition

Georgios Rizos, Alice Baird, Max Elliott, BjÃ¶rn Schuller

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society