Speech Emotion Recognition With Local-Global Aware Deep Representation Learning

Jiaxing Liu, Zhilei Liu, Longbiao Wang, Lili Guo, Jianwu Dang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 15:38

04 May 2020

Convolutional neural networks (CNN) based deep representation learning methods for speech emotion recognition (SER) have demonstrated great success. The basic design of CNN restricts the ability to model only local information well. Capsule network (CapsNet) can overcome the shortages of CNNs to capture the shallow global features from the spectrogram, although CapsNet canât learn the local and deep global information. In this paper, we propose a local-global aware deep representation learning system that mainly includes two modules. One module contains a multi-scale CNN, time-frequency CNN (TFCNN) to learn the local representation. In the other module, we introduce a structure with dense connections of multiple blocks to learn shallow and deep global information. Every block in this structure is a complete CapsNet improved by a new routing algorithm. The local and global representations are fed to the classifier and achieve an absolute increase of at least 4.25% than benchmarks on IEMOCAP.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Speech Emotion Recognition With Local-Global Aware Deep Representation Learning

Jiaxing Liu, Zhilei Liu, Longbiao Wang, Lili Guo, Jianwu Dang

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society