Unsupervised domain adaptation for preference learning based speech emotion recognition
Abinay Reddy Naini (The University of Texas at Dallas); Mary Kohler (Laboratory for Analytic Sciences, North Carolina State University); Carlos Busso (University of Texas at Dallas)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Retrieving speech samples that have specific expressive content has many applications. It is desirable to build a preference learning framework that ranks speech samples according to emotional attribute values that generalize well to new domains. A popular architecture for preference learning is the RankNet framework, which uses a function to obtain the preference between pairs of speech sentences. This study explores implementing this function with alternative feature representations that are explicitly selected to reduce the mismatch between source and target domains. In particular, we implement our preference-learning based speech emotion recognition (SER) system using ladder networks and adversarial domain adaptation. The study also proposes a novel combination of these two unsupervised domain adaptation strategies. The experimental results in cross-corpus evaluations using the MSP-Podcast and MSP-IMPROV datasets reveal that the proposed adversarial domain adaptation on a ladder network-based feature representation performs the best across different conditions. The results also show that preference learning leads to better precision for retrieval tasks than comparable SER systems built to directly predict absolute emotional attribute scores.