Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Sathvik Udupa (Indian Institute of Science); Siddarth C (Robert Bosch Centre for Data Science and AI, Indian Institute of Technology Madras); Prasanta Dr Ghosh (Indian Institute of Science (IISc), Bangalore)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the mapping for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion classification, etc., we experiment with its efficacy for AAI. We train on SSL features with transformer neural networks-based AAI models of 3 different model complexities and compare its performance with MFCCs in subject-specific (SS), pooled and fine-tuned (FT) configurations with data from 10 subjects, and evaluate with correlation coefficient (CC) score on the unseen sentence test set. We find that acoustic feature reconstruction objective-based SSL features such as TERA and DeCoAR work well for AAI, with SS CCs of these SSL features reaching close to the best FT CCs of MFCC. We also find the results consistent across different model sizes.

Tags:

Speech production, perception and psychoacoustics

Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Sathvik Udupa (Indian Institute of Science); Siddarth C (Robert Bosch Centre for Data Science and AI, Indian Institute of Technology Madras); Prasanta Dr Ghosh (Indian Institute of Science (IISc), Bangalore)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality

Acoustic Echo Cancellation Signal Processing Grand Challenge 2023

The First Pathloss Radio Map Prediction Challenge

Join the IEEE Signal Processing Society