Skip to main content

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

Chao-Han Huck Yang (Georgia Institute of Technology ); Bo Li (Google); Yu Zhang (Google); Nanxin Chen (John Hopkins Universoty); Rohit Prabhavalkar (Google); Tara Sainath (Google); Trevor Strohman (Google)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can re-purpose well-trained English models to recognize the other languages. We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement that, for the first time, empowers model reprogramming on automatic speech recognition (ASR). Specifically, we carefully investigate how to select trainable components (i.e., encoder and decoder) of a conformer-based RNN-Transducer, as a frozen pre-trained backbone. Experiments on a seven-language multilingual LibriSpeech speech (MLS) task show that model reprogramming only requires 4.18% (11.0M out of 270M) to 6.82% (45M out of 660M) of its original trainable parameters from a full ASR model to perform competitive recognition results from 12.16% to 8.14% WER. In addition, we discover different setups to make large-scale pre-trained ASR succeed in both monolingual and multilingual speech recognition. Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses (e.g., w2v-bert) in terms of lower WER and better sample efficiency in terms of training time.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00