On Using the UA-Speech and TORGO Databases to Validate Automatic Dysarthric Speech Classification Approaches

Guilherme Schu (Idiap); Parvaneh janbakhshi (Bayer AG); Ina Kodrasi (Idiap Research Institute)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Although the UA-Speech and TORGO databases of control and dysarthric speech are invaluable resources made available to the research community with the objective of developing robust automatic speech recognition systems, they have also been used to validate a considerable number of automatic dysarthric speech classification approaches. Such approaches typically rely on the underlying assumption that recordings from control and dysarthric speakers are collected in the same noiseless environment using the same recording setup. In this paper, we show that this assumption is violated for the UA-Speech and TORGO databases. Using voice activity detection to extract speech and non-speech segments, we show that the majority of state-of-the-art dysarthria classification approaches achieve the same or a considerably better performance when using the non-speech segments of these databases than when using the speech segments. These results demonstrate that such approaches trained and validated on the UA-Speech and TORGO databases are potentially learning characteristics of the recording environment or setup rather than dysarthric speech characteristics. We hope that these results raise awareness in the research community about the importance of the quality of recordings when developing and evaluating automatic dysarthria classification approaches.

Tags:

Speech analysis and Language disorder Analysis

On Using the UA-Speech and TORGO Databases to Validate Automatic Dysarthric Speech Classification Approaches

Guilherme Schu (Idiap); Parvaneh janbakhshi (Bayer AG); Ina Kodrasi (Idiap Research Institute)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

REPRESENTATION OF VOCAL TRACT LENGTH TRANSFORMATION BASED ON GROUP THEORY

A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition

Join the IEEE Signal Processing Society