A Comparison Of Self-Supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings

Lisa van Staden, Herman Kamper

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 0:15:24

19 Jan 2021

Many speech processing tasks involve measuring the acoustic similarity between speech segments. Acoustic word embeddings (AWE) allow for efficient comparisons by mapping speech segments of arbitrary duration to fixed-dimensional vectors. For zero-resource speech processing, where unlabelled speech is the only available resource, some of the best AWE approaches rely on weak top-down constraints in the form of automatically discovered word-like segments. Rather than learning embeddings at the \textit{segment level}, another line of zero-resource research has looked at representation learning at the short-time frame level. Recent approaches include self-supervised predictive coding and correspondence autoencoder~(CAE) models. In this paper we consider whether these frame-level features are beneficial when used as inputs for training to an unsupervised AWE model. We compare frame-level features from contrastive predictive coding (CPC), autoregressive predictive coding and a CAE to conventional MFCCs. These are used as inputs to a recurrent CAE-based AWE model. In a word discrimination task on English and Xitsonga data, all three representation learning approaches outperform MFCCs, with CPC consistently showing the biggest improvement. In cross-lingual experiments we find that CPC features trained on English can also be transferred to Xitsonga.

Tags:

sps conference

slt 2021

A Comparison Of Self-Supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings

Lisa van Staden, Herman Kamper

Value-Added Bundle(s) Including this Product

SLT 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society