Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 12:48
04 May 2020

In this paper, we propose a multi-channel speech source separation method with a deep neural network (DNN) which is trained under the condition that no clean signal is available. As an alternative to a clean signal, the proposed method adopts an estimated speech signal by an unsupervised speech source separation method which leverages a statistical model. As a statistical model of microphone input signal, we adopts a time-varying spatial covariance matrix (SCM) model which includes reverberation and background noise submodels so as to achieve robustness against reverberation and background noise. The DNN infers intermediate variables which are needed for constructing the time-varying SCM. Separation is performed in a probabilistic manner so as to avoid overfitting to separation error. Since there are multiple intermediate variables, a loss function which evaluates a single intermediate variable is not applicable. Instead, the proposed method adopts a loss function which evaluates the output probabilistic signal directly based on Kullback-Leibler Divergence (KLD). The gradient of the loss function can be back-propagated into the DNN through all the intermediate variables. Experimental results under reverberant conditions show that the proposed method achieves better results than the conventional methods.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00