Diffusion Models for Speech Enhancement and Restoration

Timo Gerkmann

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 01:45:18

19 Jun 2024

Today, speech communication devices for telephony, video-telephony, and assistive listening is broadly used -- by many of us daily. Often, speech communication is disturbed by acoustic or transmission artifacts. Acoustic artifacts include background noise and reverberation; transmission artifacts include, e.g., coding artifacts such as acoustic bandwidth reduction or packet loss. Speech enhancement and restoration address the reduction of acoustic and transmission artifacts. While traditionally predictive approaches have been used, more recently, generative approaches, particularly diffusion models, are gaining increasing interest. These generative approaches result in remarkable perceived speech quality given a broad range of acoustic disturbances and transmission artifacts. In this talk, the presenter will introduce diffusion models for speech enhancement and restoration. He will start by explaining the underlying concept and then explain how powerful approaches like SGMSE+ differ from vanilla diffusion models by integrating environmental noise in the stochastic differential equation, describing the forward and backward diffusion processes. Besides the strengths of diffusion models, he will also highlight current research topics that address reducing computational complexity and hallucinations in challenging situations.

Tags:

SPS Webinar 2024

Diffusion models

speech enhancement

speech restoration

Audio and Acoustic Signal Processing