Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:11:17
11 May 2022

While speech emotion recognition (SER) has been actively studied, the amount and variations of training data are limited compared with speech recognition and speaker recognition tasks. Therefore, it is promising to combine multiple corpora to train a generalized SER model. However, the manner of emotion expression is different according to the settings, task domains, and languages. In particular, there is a mismatch between acted datasets and spontaneous datasets since the former includes much more rich and explicit emotion expressions than the latter. In this paper, we investigate effective combination methods based on multi-task learning (MTL) considering the style attribute. We also hypothesize the neutral expression, which has the largest number of samples, is not affected by the style, and thus propose a selective MTL method that applies MTL to emotion categories except for the neutral category. Experimental evaluations using the IEMOCAP database and a call center dataset confirm the effect of the combination of the two corpora, MTL, and the proposed selective MTL.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00