TOWARDS IDENTITY PRESERVING NORMAL TO DYSARTHRIC VOICE CONVERSION

Wen-Chin Huang, Lester Phillip Violeta, Tomoki Toda, Bence Mark Halpern, Odette Scharenborg

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:13

09 May 2022

We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech, while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker's voice was limited and requires further improvements.

Tags:

autoencoder

dysarthric speech

pathological speech

sequence-to-sequence modeling

voice conversion

TOWARDS IDENTITY PRESERVING NORMAL TO DYSARTHRIC VOICE CONVERSION

Wen-Chin Huang, Lester Phillip Violeta, Tomoki Toda, Bence Mark Halpern, Odette Scharenborg

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

Slides: Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

Join the IEEE Signal Processing Society