Automatic Elicitation Compliance For Short-Duration Speech Based Depression Detection
Brian Stasak, Zhaocheng Huang, Dale Joachim, Julien Epps
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:32
Detecting depression from the voice in naturalistic environments is challenging, particularly for short-duration audio recordings. This enhances the need to interpret and make optimal use of elicited speech. The rapid consonant-vowel syllable combination ‘pataka’ has frequently been selected as a clinical motor-speech task. However, there is significant variability in elicited recordings, which remains to be investigated. In this multi-corpus study of over 25,000 ‘pataka’ utterances, it was discovered that speech landmark-based features were sensitive to the number of ‘pataka’ utterances per recording. This landmark feature sensitivity was newly exploited to automatically estimate ‘pataka’ count and rate, achieving root mean square errors nearly three times lower than chance-level. Leveraging count-rate knowledge of the elicited speech for depression detection, results show that the estimated ‘pataka’ number and rate are important for normalizing evaluative ‘pataka’ speech data. Count and/or rate normalized ‘pataka’ models produced relative reductions in depression classification error of up to 26% compared with non-normalized models.
Chairs:
Mathew Magimai Doss