Data Augmenting Contrastive Learning Of Speech Representations In The Time Domain

Eugene Kharitonov, Morgane Rivi猫re, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazar茅, Matthijs Douze, Emmanuel Dupoux

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 0:12:59

19 Jan 2021

Contrastive Predictive Coding (CPC), based on predicting future segments of speech from past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs compared to other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library which we adapt and optimize for the specificities of CPC (raw waveform input, contrastive loss, past versus future structure). We find that applying augmentation only to the segments from which the CPC prediction is performed, yields better results than applying it also to future segments from which the samples (both positive and negative) of the contrastive loss are drawn. After selecting the best combination of pitch modification, additive noise and reverberation on unsupervised metrics on LibriSpeech (with a gain of 18-22% relative on the ABX score), we apply this combination without any change to three new datasets in the Zero Resource Speech Benchmark 2017 and beat the state-of-the-art using out-of-domain training data. Finally, we show that the data-augmented pretrained features improve a downstream phone recognition task in the Libri-light semi-supervised setting (10min, 1h or 10h of labelled data) reducing the PER by 15% relative.

Tags:

sps conference

slt 2021

Data Augmenting Contrastive Learning Of Speech Representations In The Time Domain

Eugene Kharitonov, Morgane Rivi猫re, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazar茅, Matthijs Douze, Emmanuel Dupoux

Value-Added Bundle(s) Including this Product

SLT 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society