SELECTIVE FILM CONDITIONING WITH CTC-BASED ASR PROBABILITY FOR SPEECH ENHANCEMENT

Da-Hee Yang (Hanyang University); Joon-Hyuk Chang (Hanyang University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Enhancing speech quality and intelligibility for automatic speech recognition (ASR) plays an important role in modeling speech enhancement (SE) systems. However, improving the ASR performance by utilizing SE networks is not guaranteed, owing to the discrepancy in the training methods of the two systems. Therefore, recent studies have gradually incorporated ASR information into SE systems by jointly training ASR and SE systems. Although prior studies have improved the performance, they are inefficient because the two networks are combined and require large model sizes. To address this limitation, we propose an efficient way to use feature-wise linear modulation (FiLM) conditioning with CTC-based ASR probabilities for the SE system. The proposed model is designed by stacking a FiLM layer with selective learning on each temporal convolutional network of the SE estimation module. This allows the SE network to adaptively select ASR information based on the relationship between context and acoustic information. The proposed method improves SE and ASR performance, resulting in more robust results against noise with only a small increase in the number of parameters.

Tags:

Speech enhancement and separation

SELECTIVE FILM CONDITIONING WITH CTC-BASED ASR PROBABILITY FOR SPEECH ENHANCEMENT

Da-Hee Yang (Hanyang University); Joon-Hyuk Chang (Hanyang University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing Audio-Visual Speech Enhancement

SINGLE-CHANNEL SPEECH ENHANCEMENT WITH DEEP COMPLEX U-NETWORKS AND PROBABILISTIC LATENT SPACE MODELS

Fast and Efficient Speech Enhancement with Variational Autoencoders

Join the IEEE Signal Processing Society