A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

Zhepei Wang (University of Illinois at Urbana-Champaign); Ritwik Giri (Amazon); Devansh Shah (Amazon Web Services); Jean-Marc Valin (Amazon); Michael M Goodwin (AWS ); Paris Smaragdis (University of Illinois at Urbana-Champaign)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of the enhanced output and mitigate oversuppression, we experiment with re-weighting frames by the presence or absence of speech activity and applying augmentations to speaker embeddings. By training under a multi-task learning setting, we empirically show that the proposed unified model obtains promising results on both personalized and non-personalized speech enhancement benchmarks and reaches similar performance to models that are trained specialized for either task. The strong performance of the proposed method demonstrates that the unified model is a more economical alternative compared to keeping separate task-specific models during inference.

Tags:

Speech enhancement and separation

A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

Zhepei Wang (University of Illinois at Urbana-Champaign); Ritwik Giri (Amazon); Devansh Shah (Amazon Web Services); Jean-Marc Valin (Amazon); Michael M Goodwin (AWS ); Paris Smaragdis (University of Illinois at Urbana-Champaign)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing Audio-Visual Speech Enhancement

Fast and Efficient Speech Enhancement with Variational Autoencoders

SINGLE-CHANNEL SPEECH ENHANCEMENT WITH DEEP COMPLEX U-NETWORKS AND PROBABILISTIC LATENT SPACE MODELS

Join the IEEE Signal Processing Society