Massively Multilingual Shallow Fusion with Large Language Models

Ke Hu (Google); Tara Sainath (Google); Bo Li (Google); Nan Du (Google Brain); Yanping Huang (Google Brain); Andrew M Dai (Google Brain); Yu Zhang (Google); Rodrigo Cabrera (Google); Zhifeng Chen (Google); Trevor Strohman (Google)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

While large language models (LLM) have made impressive progress in natural language processing, it remains unclear how to utilize them in improving automatic speech recognition (ASR). In this work, we propose to train a single multilingual language model (LM) for shallow fusion in multiple languages. We push the limits of the multilingual LM to cover up to 84 languages by scaling up using a mixture-of-experts LLM, i.e., generalist language model (GLaM). When the number of experts increases, GLaM dynamically selects only two at each decoding step to keep the inference computation roughly constant. We then apply GLaM to a multilingual shallow fusion task based on a state-of-the-art end-to-end model. Compared to a dense LM of similar computation during inference, GLaM reduces the WER of an English long-tail test set by 4.4% relative. In a multilingual shallow fusion task, GLaM improves 41 out of 50 languages with an average relative WER reduction of 3.85%, and a maximum reduction of 10%. Compared to the baseline model, GLaM achieves an average WER reduction of 5.53% over 43 languages.

Tags:

language modeling

Massively Multilingual Shallow Fusion with Large Language Models

Ke Hu (Google); Tara Sainath (Google); Bo Li (Google); Nan Du (Google Brain); Yanping Huang (Google Brain); Andrew M Dai (Google Brain); Yu Zhang (Google); Rodrigo Cabrera (Google); Zhifeng Chen (Google); Trevor Strohman (Google)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Large-Scale and Parameter-Efficient Language Modeling for Speech Processing

HAG: Hierarchical Attention with Graph Network for Dialogue Act Classification in Conversation

Enhancing Unsupervised Speech Recognition with Diffusion GANs

Join the IEEE Signal Processing Society