MULTI-CHANNEL SPEAKER DIARIZATION USING SPATIAL FEATURES FOR MEETINGS
Naijun Zheng, XunYing Liu, Helen Meng, Na Li, Jianwei Yu, Chao Weng, Dan Su
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:13:09
Speaker identification for overlapped speech presents a great challenge for speaker diarization tasks in meeting scenarios. In order to overcome such challenges, several overlap-aware resegmentation methods based on deep learning have been integrated into speaker diarization systems. In this paper we propose two multichannel diarization systems which have enhanced capability in detecting overlapped speech and identify speakers via learning spatial features. The first system applies a multi-look strategy to train networks without given the speakers' direction of arrival(DOA), and the other system estimates the DOA of target speakers based on existing diarization results. Both systems aim to estimate the voice activity of speakers in different directions to handle overlapped speech. Experimental results on the AMI corpus show that the relative improvements of both systems can reach 9.4% and 18.1% in term of diarization error rate (DER) against an overlap-aware single-channel system with a BeamformIt front-end.