Toward Better Speaker Embeddings: Automated Collection Of Speech Samples From Unknown Distinct Speakers

Minh Pham, Zeqian Li, Jacob Whitehill

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:16

04 May 2020

The accuracy of speaker verification and diarization models depends on the quality of the speaker embeddings used to separate audio samples from different speakers. With the goal of training better embedding models, we devise an au- tomatic pipeline for large-scale collection of speech samples from unique speakers that is significantly more automated than previous approaches. With this pipeline, we collect and publish the BookTubeSpeech dataset, containing 8,450 YouTube videos (7.74 min per video on average) that each contains a single unique speaker. Using this dataset combined with VoxCeleb2, we show a substantial improvement in the quality of embeddings when tested on LibriSpeech compared to a model trained on only VoxCeleb2.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Toward Better Speaker Embeddings: Automated Collection Of Speech Samples From Unknown Distinct Speakers

Minh Pham, Zeqian Li, Jacob Whitehill

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society