MoLE : MIXTURE OF LANGUAGE EXPERTS FOR MULTI-LINGUAL AUTOMATIC SPEECH RECOGNITION

Yoohwan Kwon (Naver corperation); Soo-Whan Chung (Naver Corporation)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Multi-lingual speech recognition aims to distinguish linguistic expressions in different languages and integrate acoustic processing simultaneously. In contrast, current multi-lingual speech recognition research follows a language-aware paradigm, mainly targeted to improve recognition performance rather than discriminate language characteristics. In this paper, we present a multi-lingual speech recognition network named Mixture-of-Language-Expert(MoLE), which digests speech in a variety of languages. Specifically, MoLE analyzes linguistic expression from input speech in arbitrary languages, activating a language-specific expert with a lightweight language tokenizer. The tokenizer not only activates experts, but also estimates the reliability of the activation. Based on the reliability, the activated expert and the language-agnostic expert are aggregated to represent language-conditioned embedding for efficient speech recognition. Our proposed model is evaluated in 5 languages scenario, and the experimental results show that our structure is advantageous on multi-lingual recognition, especially for speech in low-resource language.

Tags:

Resource constrained speech recognition

MoLE : MIXTURE OF LANGUAGE EXPERTS FOR MULTI-LINGUAL AUTOMATIC SPEECH RECOGNITION

Yoohwan Kwon (Naver corperation); Soo-Whan Chung (Naver Corporation)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Papez: Resource-efficient Speech Separation with Auditory Working Memory

Improving Accented Speech Recognition with Multi-Domain Training

Ensemble knowledge distillation of self-supervised speech models

Join the IEEE Signal Processing Society