MULTI-STAGE GRAPH REPRESENTATION LEARNING FOR DIALOGUE-LEVEL SPEECH EMOTION RECOGNITION
Yaodong Song, Jiaxing Liu, Longbiao Wang, Ruiguo Yu, Jianwu Dang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:19
With the development of speech emotion recognition (SER), most of current research is utterance-level and cannot fit the need of actual scenarios. In this paper, we propose a novel strategy that focuses on capturing dialogue-level contextual information. On the basis of utterance-level representation learned by convolutional neural network (CNN) which is followed by the bidirectional long short-term memory network (BLSTM), the proposed dialogue-level method consists of two modules. The first module is Dialogue Multi-stage Graph Representation Learning Algorithm (DialogMSG). The multi-stage graph that modeling from different dialogue scope is introduced to capture more effective information. The other one is a double-constrained module. This module includes not only an utterance-level classifier but also a dialogue-level graph classifier which is named as Atmosphere. The results of extensive experiments show that the proposed method outperforms the current state of the art on the IEMOCAP benchmark dataset.