Analysis of Noisy-target Training for DNN-based speech enhancement

Takuya Fujimura (Nagoya University); Tomoki Toda (Nagoya University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Deep neural network (DNN)-based speech enhancement usually uses a clean speech as a training target. However, it is hard to collect large amounts of clean speech because the recording is very costly. In other words, the performance of current speech enhancement has been limited by the amount of training data. To relax this limitation, Noisy-target Training (NyTT) that utilizes noisy speech as a training target has been proposed. Although it has been experimentally shown that NyTT can train a DNN without clean speech, a detailed analysis has not been conducted and its behavior has not been understood well. In this paper, we conduct various analyses to deepen our understanding of NyTT. In addition, based on the property of NyTT, we propose a refined method that is comparable to the method using clean speech. Furthermore, we show that we can improve the performance by using a huge amount of noisy speech with clean speech.

Tags:

Audio signal enhancement and restoration

Analysis of Noisy-target Training for DNN-based speech enhancement

Takuya Fujimura (Nagoya University); Tomoki Toda (Nagoya University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

MAID: A Conditional Diffusion Model For Long Music Audio Inpainting

CENTRALIZED CASCADE MULTI-CHANNEL NOISE REDUCTION AND ACOUSTIC FEEDBACK CANCELLATION IN A WIRELESS ACOUSTIC SENSOR AND ACTUATOR NETWORK

A MODEL-BASED HEARING COMPENSATION METHOD USING A SELF-SUPERVISED FRAMEWORK

Join the IEEE Signal Processing Society