A MULTITASK LEARNING FRAMEWORK FOR SPEAKER CHANGE DETECTION WITH CONTENT INFORMATION FROM UNSUPERVISED SPEECH DECOMPOSITION

Hang Su, Danyang Zhao, Long Dang, Xixin Wu, Xunying Liu, Helen Meng, Minglei Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:24

12 May 2022

Speaker Change Detection (SCD) is a task of determining the time boundaries between speech segments of different speakers. SCD system can be applied to many tasks, such as speaker diarization, speaker tracking, and transcribing audio with multiple speakers. Recent advancements in deep learning lead to approaches that can directly detect the speaker change points from audio data at the frame-level based on neural network models. These approaches may be further improved by utilizing speaker information in the training data, and utilizing content information extracted in an unsupervised manner. This work proposes a novel framework for the SCD task, which utilizes a multitask learning architecture to leverage speaker information during the training stage, and adds the content information extracted from an unsupervised speech decomposition model to help detect the speaker change points. Experiment results show that the architecture of multitask learning with speaker information can improve the performance of SCD, and adding content information extracted from unsupervised speech decomposition model can further improve the performance. To the best of our knowledge, this work outperforms the state-of-the-art SCD results on the AMI dataset.

Tags:

unsupervised speech decomposition

speaker change detection

multitask learning

A MULTITASK LEARNING FRAMEWORK FOR SPEAKER CHANGE DETECTION WITH CONTENT INFORMATION FROM UNSUPERVISED SPEECH DECOMPOSITION

Hang Su, Danyang Zhao, Long Dang, Xixin Wu, Xunying Liu, Helen Meng, Minglei Li

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SEGMENTATION AND CLASSIFICATION-BASED DIAGNOSIS OF TUMORS FROM BREAST ULTRASOUND IMAGES USING MULTIBRANCH UNET

MULTITASK SPARSE NEURAL NETWORK FOR HYPERSPECTRAL IMAGE DENOISING

IMPROVING FEATURE GENERALIZABILITY WITH MULTITASK LEARNING IN CLASS INCREMENTAL LEARNING

Join the IEEE Signal Processing Society