Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Marie Kunešová (University of West Bohemia); Zbyněk Zajíc ( University of West Bohemia)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Self-supervised learning approaches have lately achieved great success on a broad spectrum of machine learning problems. In the field of speech processing, one of the most successful recent self-supervised models is wav2vec 2.0. In this paper, we explore the effectiveness of this model on three basic speech classification tasks: speaker change detection, overlapped speech detection, and voice activity detection. First, we concentrate on only one task – speaker change detection – where our proposed system surpasses the previously reported results on four different corpora, and achieves comparable performance even when trained on out-of-domain data from an artificially designed dataset. Then we expand our approach to tackle all three tasks in a single multitask system with state-of-the-art performance on the AMI corpus. The implementation of the algorithms in this paper is publicly available at https://github.com/mkunes/w2v2_audioFrameClassification.

Tags:

Word spotting, VAD, and other topics in speech recognition

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Marie Kunešová (University of West Bohemia); Zbyněk Zajíc ( University of West Bohemia)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis

FEDERATED LEARNING FOR ASR BASED ON WAV2VEC 2.0

Neural Diarization with Non-autoregressive Intermediate Attractors

Join the IEEE Signal Processing Society