Skip to main content

DISPEECH: A SYNTHETIC TOY DATASET FOR SPEECH DISENTANGLING

Olivier Zhang, Nicolas Gengembre, Olivier Le Blouch, Damien Lolive

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:07:50
13 May 2022

Recently, a growing interest in unsupervised learning of disentangled representations has been observed, with successful applications to both synthetic and real data. In speech processing, such methods have been able to disentangle speakers' attributes from verbal content.To have a better understanding of disentanglement, synthetic data is necessary, as it provides a controllable framework to train models and evaluate disentanglement. Thus, we introduce diSpeech, a corpus of speech synthesized with the Klatt synthesizer. Its first version is constrained to vowels synthesized with 5 generative factors relying on pitch and formants. Experiments show the ability of variational autoencoders to disentangle these generative factors and assess the reliability of disentanglement metrics. In addition to provide a support to benchmark speech disentanglement methods, diSpeech also enables the objective evaluation of disentanglement on real speech, which is to our knowledge unprecedented. To illustrate this methodology, we apply it to TIMIT?s isolated vowels.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00