EMix: A Data Augmentation Method for Speech Emotion Recognition
An Dang (National Central University); Toan H Vu (National Central University); Nguyen Dinh Le (National Central University); Jia-Ching Wang (National Central University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In the last few years, many deep learning (DL) models have been developed to improve the accuracy of speech emotion recognition (SER). However, as SER datasets are generally small and insufficient due to their difficult and expensive collection, the DL models are prone to overfitting, so their performance is limited. In this paper, we introduce a novel data augmentation (DA) method for the SER problem, namely EMix, which is simple but effective. The method creates new data by mixing pairs of selective samples from the original data. The generated mixtures will be noisier or less ambiguous than their constructive ones. To verify the effectiveness of the proposed DA, we develop a transformer-based network for the SER task, and experiment with the two public datasets including IEMOCAP and Crema-D. The experimental results demonstrate the superiority of EMix over other DA methods. In comparison with state-of-the-art methods, our approach shows competitive performance.