Skip to main content

Speaker Diarization Using Latent Space Clustering In Generative Adversarial Network

Monisankha Pal, Manoj Kumar, Raghuveer Peri, Taejin Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 13:38
04 May 2020

In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) back-projection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, latent variable recovery loss, and a clustering-specific loss. It uses x-vector speaker embeddings at the input, while the latent variables are sampled from a combination of continuous random variables and discrete one-hot encoded variables using the original speaker labels. We benchmark our proposed system on the AMI meeting corpus, and two child-clinician interaction corpora (ADOS and BOSCC) from the autism diagnosis domain. ADOS and BOSCC contain diagnostic and treatment outcome sessions respectively obtained in clinical settings for verbal children and adolescents with autism. Experimental results show that our proposed system significantly outperform the state-of-the-art x-vector based diarization system on these databases. Further, we perform embedding fusion with x-vectors to achieve a relative diarization error rate (DER) improvement of 31%, 36% and 49% on AMI eval, ADOS and BOSCC corpora respectively, when compared to the x-vector baseline using oracle speech segmentation.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00