Utilizing Wav2vec in Database-independent Voice Disorder Detection

Saska Tirronen (Aalto University); Farhad Javanmardi (Aalto University); Manila Kodali (Aalto University); Sudarsana Reddy Kadiri (Aalto University); Paavo Alku (Aalto University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Automatic detection of voice disorders from acoustic speech signals can help to improve reliability of medical diagnosis. However, the real-life environment in which speech signals are recorded for diagnosis can be different from the environment in which the detection system’s training data was originally collected. This mismatch between the recording conditions can decrease detection performance in practical scenarios. In this work, we propose to use a pre-trained wav2vec 2.0 model as a feature extractor to build automatic detection systems for voice disorders. The embeddings from the first layers of the context network contain information about phones, and these features are useful in voice disorder detection. We evaluate the performance of the wav2vec features in single-database and cross-database scenarios to study their generalizability to unseen speakers and recording conditions. The results indicate that the wav2vec features generalize better than popular spectral and cepstral baseline features.

Tags:

Speech analysis and Language disorder Analysis

Utilizing Wav2vec in Database-independent Voice Disorder Detection

Saska Tirronen (Aalto University); Farhad Javanmardi (Aalto University); Manila Kodali (Aalto University); Sudarsana Reddy Kadiri (Aalto University); Paavo Alku (Aalto University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

Join the IEEE Signal Processing Society