Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 14:56
04 May 2020

While speech enhancement has critically required the estimation of local time-varying SNR, it was recently shown that SNR can be marginalized in a Bayesian sense from the minimum-mean-square-error (MMSE) solution. Precisely, the local SNR is introduced as a stochastic variable and Bayesian integration can be approximately realized under consideration of a hyperprior distribution. In our paper, the proposed approach then takes the multimodal nature of the involved posterior distribution into account for speech inference. Specifically, the extrema of the posterior distribution, which can easily be obtained via differentiation, are combined according to their widths, heights and abscissa. The corresponding solution is not closed form, however, it is found within few iterations. This approach delivers a spectral weighting of noisy speech that simultaneously maximizes instrumental criteria of speech quality, specifically the segmental SNR, STOI score and PESQ.