Noise-aware target extension with self-distillation for robust speech recognition

Ju-seok Seong (Hanyang University); Jeong-Hwan Choi (Hanyang University); Jehyun Kyung (Hanyang University); Ye-Rin Jeoung (Hanyang University); Joon-Hyuk Chang (Hanyang University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

Data augmentation using additive noise is a framework for robustly training automatic speech recognition models. To utilize noise information efficiently, previous studies used an additional branch to classify noise conditions. This added branch has a limited effect on the ASR because it performs independently of the ASR branch that classifies senones. In this paper, we propose a noise-aware target extension (NATE) that extends the senone target to contain noise awareness by jointly classifying the senone and noise in a single branch. In the inference stage, the output of the model is processed separately by the noise condition and then aggregated to match the senone posterior distribution. In addition, we combine NATE with self-distillation (NATE_sd) to reduce the model parameters and avoid discrepancies between the outputs of training and inference. The effectiveness of the NATE method is validated on the two benchmark development and evaluation sets and simulated noisy test sets, resulting in significant improvements over the previous methods.

Tags:

New algorithms and approaches for speech recognition

Noise-aware target extension with self-distillation for robust speech recognition

Ju-seok Seong (Hanyang University); Jeong-Hwan Choi (Hanyang University); Jehyun Kyung (Hanyang University); Ye-Rin Jeoung (Hanyang University); Joon-Hyuk Chang (Hanyang University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

PRACTICE OF THE CONFORMER ENHANCED AUDIO-VISUAL HUBERT ON MANDARIN AND ENGLISH

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

A Quantum Approach for Stochastic Constrained Binary Optimization

Join the IEEE Signal Processing Society