Unsupervised domain adaptation for preference learning based speech emotion recognition

Abinay Reddy Naini (The University of Texas at Dallas); Mary Kohler (Laboratory for Analytic Sciences, North Carolina State University); Carlos Busso (University of Texas at Dallas)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Retrieving speech samples that have specific expressive content has many applications. It is desirable to build a preference learning framework that ranks speech samples according to emotional attribute values that generalize well to new domains. A popular architecture for preference learning is the RankNet framework, which uses a function to obtain the preference between pairs of speech sentences. This study explores implementing this function with alternative feature representations that are explicitly selected to reduce the mismatch between source and target domains. In particular, we implement our preference-learning based speech emotion recognition (SER) system using ladder networks and adversarial domain adaptation. The study also proposes a novel combination of these two unsupervised domain adaptation strategies. The experimental results in cross-corpus evaluations using the MSP-Podcast and MSP-IMPROV datasets reveal that the proposed adversarial domain adaptation on a ladder network-based feature representation performs the best across different conditions. The results also show that preference learning leads to better precision for retrieval tasks than comparable SER systems built to directly predict absolute emotional attribute scores.

Tags:

Speech emotion detection and analysis

Unsupervised domain adaptation for preference learning based speech emotion recognition

Abinay Reddy Naini (The University of Texas at Dallas); Mary Kohler (Laboratory for Analytic Sciences, North Carolina State University); Carlos Busso (University of Texas at Dallas)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Emotion Recognition in Conversation from Variable-Length Context

Tranferring Quantified Emotion Knowledge for the Detection of Depression in Alzheimer's Disease Using ForestNets

DST: DEFORMABLE SPEECH TRANSFORMER FOR EMOTION RECOGNITION

Join the IEEE Signal Processing Society