Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

Yan Zhao (Southeast University); JIncen Wang (Southeast University); Yuan Zong (Southeast University); Wenming Zheng (Southeast University); Hailun lian (Southeast University); Li Zhao (Southeast University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In this paper, we propose a novel deep transfer learning method called deep implicit distribution alignment networks (DIDAN) to deal with cross-corpus speech emotion recognition (SER) problem, in which the labeled training (source) and unlabeled testing (target) speech signals come from different corpora. Specifically, DIDAN first adopts a simple deep regression network consisting of a set of convolutional and fully connected layers to directly regress the source speech spectrums into the emotional labels such that the proposed DIDAN can own the emotion discriminative ability. Then, such ability is transferred to be also applicable to the target speech samples regardless of corpus variance by resorting to a well-designed regularization term called implicit distribution alignment (IDA). Unlike widely-used maximum mean discrepancy (MMD) and its variants, the proposed IDA absorbs the idea of sample reconstruction to implicitly align the distribution gap, which enables DIDAN to learn both emotion discriminative and corpus invariant features from speech spectrums. To evaluate the proposed DIDAN, extensive cross-corpus SER experiments on widely-used speech emotion corpora are carried out. Experimental results show that the proposed DIDAN can outperform lots of recent stateof-the-art methods in coping with the cross-corpus SER tasks.

Tags:

Speech production, perception and psychoacoustics

Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

Yan Zhao (Southeast University); JIncen Wang (Southeast University); Yuan Zong (Southeast University); Wenming Zheng (Southeast University); Hailun lian (Southeast University); Li Zhao (Southeast University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

Location Estimates from Channel State Information Via Binary Programming

Coded Illumination for Improved Lensless Imaging

Join the IEEE Signal Processing Society