Speaker-Invariant Affective Representation Learning Via Adversarial Training

Haoqi Li, Ming Tu, Jing Huang, Shrikanth Narayanan, Panayiotis Georgiou

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:29

04 May 2020

Representation learning for speech emotion recognition is challenging due to labeled data sparsity issue and lack of gold-standard references. In addition, there is much variability from input speech signals, human subjective perception of the signals and emotion label ambiguity. In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect of speaker variability in the speech signals. Specifically we propose to disentangle the speaker characteristics from emotion through an adversarial training network in order to better represent emotion. Our method combines the gradient reversal technique with an entropy loss function to remove such speaker information. We evaluate our approach on IEMOCAP and CMU-MOSEI datasets to provide across-corpora evaluation and show this speaker-information removal can improve speech emotion classification.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Speaker-Invariant Affective Representation Learning Via Adversarial Training

Haoqi Li, Ming Tu, Jing Huang, Shrikanth Narayanan, Panayiotis Georgiou

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society