NEURAL FULL-RANK SPATIAL COVARIANCE ANALYSIS FOR BLIND SOURCE SEPARATION
Yoshiaki Bando, Kouhei Sekiguchi, Aditya Arie Nugraha, Mathieu Fontaine, Yoshiki Masuyama, Kazuyoshi Yoshii
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:10:09
This paper describes a neural blind source separation (BSS) method based on a non-linear generative model of mixture signals. A classical statistical approach to BSS is to fit a linear generative model that consists of spatial and source models representing the inter-channel covariances and power spectral densities of sources, respectively. Although the variational autoencoder (VAE) has successfully been used as a non-linear source model with latent features, it should be pretrained from a sufficient amount of isolated signals. Our method, in contrast, enables the VAE-based source model to be trained only from mixture signals. Specifically, we introduce a neural mixture-to-feature inference model that directly infers the latent features from the observed mixture and integrate it with a neural feature-to-mixture generative model consisting of a full-rank spatial model and a VAE-based source model. All the models are optimized jointly such that the likelihood for the training mixtures is maximized. Once the inference model is optimized, it can be used for estimating the latent features of sources included in unseen mixtures. Experimental results show that our method outperformed the state-of-the-art BSS methods based on linear generative models and was comparable to a method based on supervised learning of the VAE-based source model.