MASKED AUTOENCODERS ARE ARTICULATORY LEARNERS

Ahmed A Attia (University Of Maryland College Park); Carol Y Espy-Wilson (University of Maryland)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Articulatory recordings track the positions and motion of different articulators along the vocal tract and are widely used to study speech production and to develop speech technologies such as articulatory based speech synthesizers and speech inversion systems. The University of Wisconsin X-Ray microbeam (XRMB) dataset is one of various datasets that provide articulatory recordings synced with audio recordings. The XRMB articulatory recordings employ pellets placed on a number of articulators which can be tracked by the microbeam. However, a significant portion of the articulatory recordings are mistracked, and have been so far unsuable. In this work, we present a deep learning based approach using Masked Autoencoders to accurately reconstruct the mistracked articulatory recordings for 41 out of 47 speakers of the XRMB dataset. Our model is able to reconstruct articulatory trajectories that closely match ground truth, even when three out of eight articulators are mistracked, and retrieve 3.28 out of 3.4 hours of previously unusable recordings.

Tags:

Audio and speech modeling, coding and transmission

MASKED AUTOENCODERS ARE ARTICULATORY LEARNERS

Ahmed A Attia (University Of Maryland College Park); Carol Y Espy-Wilson (University of Maryland)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Wireless Deep Speech Semantic Transmission

Play It Back: Iterative Attention for Audio Recognition

CONTRASTIVE SPEECH MIXUP FOR LOW-RESOURCE KEYWORD SPOTTING

Join the IEEE Signal Processing Society