Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition

Ayoub Ghriss, Bo Yang, Viktor Rozgic, Wang Chao, Elizabeth Shriberg

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:26

10 May 2022

We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more "emotion aware". We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the acoustic ASR on emotion annotated speech data. We evaluated the proposed approach on MSP-Podcast dataset, where we achieved the best reported concordance correlation coefficient (CCC) of 0.41 for valence prediction.

Tags:

pre-training

automatic speech recognition

sentiment analysis

speech emotion recognition

Value-Added Bundle(s) Including this Product

22 May 2022

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

10 May 2024

End-to-End Automatic Speech Recognition

1.00 pdh

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

20 Dec 2023

Towards a Speech Version of ChatGPT

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

20 Dec 2023

Neural Signal Interpretation for Spoken Communication

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00