THE SECRET SOURCE : INCORPORATING SOURCE FEATURES TO IMPROVE ACOUSTIC-TO-ARTICULATORY SPEECH INVERSION

Yashish M. Siriwardena (University of Maryland College Park); Carol Y Espy-Wilson (University of Maryland)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In this work, we incorporated acoustically derived source features, aperiodicity, periodicity and pitch as additional targets to an acoustic-to-articulatory speech inversion (SI) system. We also propose a Temporal Convolution based SI system, which uses auditory spectrograms as the input speech representation, to learn long-range dependencies and complex interactions between the source and vocal tract, to improve the SI task. The experiments are conducted with both the Wisconsin X-ray microbeam (XRMB) and Haskins Production Rate Comparison (HPRC) datasets, with comparisons done with respect to three baseline SI model architectures. The proposed SI system with the HPRC dataset gains an improvement of close to 28% when the source features are used as additional targets. The same SI system outperforms the current best performing SI models by around 9% on the XRMB dataset.

Tags:

Speech analysis and Language disorder Analysis

THE SECRET SOURCE : INCORPORATING SOURCE FEATURES TO IMPROVE ACOUSTIC-TO-ARTICULATORY SPEECH INVERSION

Yashish M. Siriwardena (University of Maryland College Park); Carol Y Espy-Wilson (University of Maryland)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

REPRESENTATION OF VOCAL TRACT LENGTH TRANSFORMATION BASED ON GROUP THEORY

A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition

Join the IEEE Signal Processing Society