Integrating End-To-End Neural And Clustering-Based Diarization: Getting The Best Of Both Worlds

Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:17

11 Jun 2021

Diarization technologies can be categorized into two approaches,i.e., clustering and end-to-end neural approaches, which have different pros and cons. The clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors. While it can be seen as a current state-of-the-art approach that works for various challenging data with reasonable robustness and accuracy, it has a critical disadvantage that it cannot handle overlapped speech that is inevitable in natural conversational data. In contrast, the end-to-end neural diarization (EEND), which directly predicts diarization labels using neural networks, was devised to handle the overlapped speech. While the EEND has started outperforming the x-vector clustering approach in some realistic database, it is difficult to make it work for long recordings (e.g., recordings longer than 10 minutes) because of, e.g., its huge memory consumption. Block-wise processing is also difficult because it poses an inter-block label permutation problem, i.e., ambiguity of the speaker label assignments between blocks. In this paper, we propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers, and show that it works significantly better than the original EEND especially when the input data is long.

Chairs:

Man-Wai Mak

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Integrating End-To-End Neural And Clustering-Based Diarization: Getting The Best Of Both Worlds

Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Keynote: Navigating the Transition to Sustainable Energy Solutions in a Power-Hungry World

Panel: Leveraging Technology to Achieve Carbon Neutrality of Buildings and Factories

Panel: Charting the Course for Future-Ready Data Centers in the Era of Sustainability

Join the IEEE Signal Processing Society