Zero-Shot Speech Emotion Recognition Using Generative Learning with Reconstructed Prototypes

Xinzhou Xu (Nanjing University of Posts and Telecommunications); Jun Deng (Agile Robots AG); Zixing Zhang (Imperial College London); Zhen Yang (Nanjing University of Posts and Telecommunication); Bjorn W. Schuller (Imperial College London)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Zero-shot Speech Emotion Recognition (SER) enables machines to perceive unseen-emotional speech without knowing any samples from these emotional states, which is helpful in audio-based autonomous affective computing. However, existing works on zero-shot SER directly employ original prototypes and only consider inter-domain knowledge transfer through learning unseen-emotional classifiers. In this regard, we propose a zero-shot SER approach using generative learning with reconstructed prototypes in this paper. Within the proposed approach, we first reconstruct prototypes using the alignment from paralinguistic features to semantic prototypes. Then, generative learning is performed to build the connection from the reconstructed prototypes to the features. Afterwards, zero-shot experiments on emotional-speech data demonstrate that the proposed approach achieves better performance compared with the state-of-the-art approaches.

Tags:

Speech analysis and Language disorder Analysis

Zero-Shot Speech Emotion Recognition Using Generative Learning with Reconstructed Prototypes

Xinzhou Xu (Nanjing University of Posts and Telecommunications); Jun Deng (Agile Robots AG); Zixing Zhang (Imperial College London); Zhen Yang (Nanjing University of Posts and Telecommunication); Bjorn W. Schuller (Imperial College London)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

REPRESENTATION OF VOCAL TRACT LENGTH TRANSFORMATION BASED ON GROUP THEORY

Join the IEEE Signal Processing Society