Ordinal Learning For Emotion Recognition In Customer Service Calls
Wenjing Han, Björn Schuller, Huabin Ruan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:49
Approaches toward ordinal speech emotion recognition (SER) tasks are commonly based on the categorical classification algorithms, where the rank-order emotions are arbitrarily treated as independent categories. To employ the ordinal information between emotional ranks, we propose to model the ordinal SER tasks under a COnsistent RAnk Logits (CORAL) based deep learning framework. Specifically, a multi-class ordinal SER task is transformed into a series of binary SER sub-tasks predicting whether an utterance's emotion is larger than a rank. All the sub-tasks are jointly solved by one single network with a mislabelling cost defined as the the sum of the individual cross-entropy loss for each sub-task. Having the VGGish as our basic network structure, via minimizing above CORAL based cost, a VGGish-CORAL network is implemented in this contribution. Experimental results on a real-world call center dataset and the widely used IEMOCAP corpus demonstrate the effectiveness of VGGish-CORAL compared to the categorical VGGish.