Skip to main content

SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

Zhibin Qiu (XinJiang University); Mengfan Fu (XinJiang University); Yinfeng Yu (Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China;Xinjiang University); Lili Yin ( Xinjiang University); Fuchun Sun (Tsinghua University); Hao Huang (Xinjiang University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose SRTNet, a novel method for speech enhancement via stochastic refinement in complete time domain. Specifically, we design a joint network consisting of a deterministic module and a stochastic module, which makes up the "enhance-and-refine" paradigm. We theoretically demonstrate the feasibility of our method and experimentally prove that our method achieves faster training, faster sampling and higher quality.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00