A knowledge-driven vowel-based approach of depression classification from speech using data augmentation

Kexin Feng (Texas A&M University); Theodora Chaspari (Texas A&M University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

We propose a novel explainable machine learning (ML) model that identifies depression from speech, by modeling the temporal dependencies across utterances and utilizing the spectrotemporal information at the vowel level. Our method first models the variable-length utterances at the local-level into a fixed-size vowel-based embedding using a convolutional neural network with a spatial pyramid pooling layer ("vowel CNN"). Following that, the depression is classified at the global-level from a group of vowel CNN embeddings that serve as the input of another 1D CNN ("depression CNN"). Different data augmentation methods are designed for both the training of vowel CNN and depression CNN. We investigate the performance of the proposed system at various temporal granularities when modeling short, medium, and long analysis windows, corresponding to 10, 21, and 42 utterances, respectively. The proposed method reaches comparable performance with previous state-of-the-art approaches and depicts explainable properties with respect to the depression outcome. The findings from this work may benefit clinicians by providing additional intuitions during joint human-ML decision-making tasks.

Tags:

Speech analysis and Language disorder Analysis

A knowledge-driven vowel-based approach of depression classification from speech using data augmentation

Kexin Feng (Texas A&M University); Theodora Chaspari (Texas A&M University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

REPRESENTATION OF VOCAL TRACT LENGTH TRANSFORMATION BASED ON GROUP THEORY

A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

Join the IEEE Signal Processing Society