Skip to main content

Noise-aware target extension with self-distillation for robust speech recognition

Ju-seok Seong (Hanyang University); Jeong-Hwan Choi (Hanyang University); Jehyun Kyung (Hanyang University); Ye-Rin Jeoung (Hanyang University); Joon-Hyuk Chang (Hanyang University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
09 Jun 2023

Data augmentation using additive noise is a framework for robustly training automatic speech recognition models. To utilize noise information efficiently, previous studies used an additional branch to classify noise conditions. This added branch has a limited effect on the ASR because it performs independently of the ASR branch that classifies senones. In this paper, we propose a noise-aware target extension (NATE) that extends the senone target to contain noise awareness by jointly classifying the senone and noise in a single branch. In the inference stage, the output of the model is processed separately by the noise condition and then aggregated to match the senone posterior distribution. In addition, we combine NATE with self-distillation (NATE_sd) to reduce the model parameters and avoid discrepancies between the outputs of training and inference. The effectiveness of the NATE method is validated on the two benchmark development and evaluation sets and simulated noisy test sets, resulting in significant improvements over the previous methods.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00