Self-supervised representations in speech-based depression detection

Wen Wu (University of Cambridge); Chao Zhang (University of Cambridge); Phil Woodland (Machine Intelligence Laboratory, Cambridge University Department of Engineering)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL). An analysis of SSL representations derived from different layers of pre-trained foundation models is first presented for SDD, which provides insight to suitable indicator for depression detection. Knowledge transfer is then performed from automatic speech recognition (ASR) and emotion recognition to SDD by fine-tuning the foundation models. Results show that the uses of oracle and ASR transcriptions yield similar SDD performance when the hidden representations of the ASR model is incorporated along with the ASR textual information. By integrating representations from multiple foundation models, state-of-the-art SDD results based on real ASR were achieved on the DAIC-WOZ dataset.

Tags:

Speech analysis and Language disorder Analysis

Self-supervised representations in speech-based depression detection

Wen Wu (University of Cambridge); Chao Zhang (University of Cambridge); Phil Woodland (Machine Intelligence Laboratory, Cambridge University Department of Engineering)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

Join the IEEE Signal Processing Society