Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:14:33
08 Jun 2021

This paper presents a statistical method for single-channel speech dereverberation using a variational autoencoder (VAE) for modelling the speech spectra. One popular approach for modelling speech spectra is to use non-negative matrix factorization (NMF) where learned clean speech spectral bases are used as a linear generative model for speech spectra. This work replaces this linear model with a powerful nonlinear deep generative model based on VAE. Further, this paper formulates a unified probabilistic generative model of reverberant speech based on Gaussian and Poisson distributions. We develop a Monte Carlo expectation-maximization algorithm for inferring the latent variables in the VAE and estimating the room impulse response for both probabilistic models. Evaluation results show the superiority of the proposed VAE-based models over the NMF-based counterparts.

Chairs:
Takuya Yoshioka

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00