Domain-Adversarial Autoencoder With Attention Based Feature Level Fusion For Speech Emotion Recognition

Yuan Gao, Jiaxing Liu, Longbiao Wang, Jianwu Dang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:04:50

09 Jun 2021

Over the past two decades, although speech emotion recognition (SER) has garnered considerable attention, the problem of insufficient training data has been unresolved. A potential solution for this problem is to pre-train a model and transfer knowledge from large amounts of audio data. However, the data used for pre-training and testing originate from different domains, resulting in the latent representations to contain non-affective information. In this paper, we propose a domain-adversarial autoencoder to extract discriminative representations for SER. Through domain-adversarial learning, we can reduce the mismatch between domains while retaining discriminative information for emotion recognition. We also introduce multi-head attention to capture emotion information from different subspaces of input utterances. Experiments on IEMOCAP show that the proposed model outperforms the state-of-the-art systems by improving the unweighted accuracy by 4.15\%, thereby demonstrating the effectiveness of the proposed model.

Chairs:

Carlos Busso

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021