Multi-Scale Speaker Diarization With Neural Affinity Score Fusion

Taejin Park, Manoj Kumar, Shrikanth Narayanan

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:31

11 Jun 2021

Predicting the speaker's identity of short speech segments in human dialogue has been considered one of the most challenging problems in speech signal processing. Speaker representations of short speech segments tend to be unreliable, resulting in poor fidelity of speaker representations in tasks requiring speaker recognition. In this paper, we propose an unconventional method that tackles the trade-off between temporal resolution and the quality of the speaker representations. To find a set of weights that balance the scores from multiple temporal scales of segments, a neural affinity score fusion model is presented. Using the CALLHOME dataset, we show that our proposed multi-scale segmentation and integration approach can achieve a state-of-the-art diarization performance.

Chairs:

Man-Wai Mak

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Multi-Scale Speaker Diarization With Neural Affinity Score Fusion

Taejin Park, Manoj Kumar, Shrikanth Narayanan

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Dl-Uct: A Deep Learning Framework For Ultrasound Computed Tomography

Out-Of-Distribution Detection In Dermatology Using Input Perturbation And Subset Scanning

Non-Convex Cell Epithelial Modeling Unveils Cellular Interactions

Join the IEEE Signal Processing Society