DISTILL-QUANTIZE-TUNE - LEVERAGING LARGE TEACHERS FOR LOW-FOOTPRINT EFFICIENT MULTILINGUAL NLU ON EDGE

Pegah Kharazmi (Amazon); Zhewei Zhao (Amazon); Clement Chung (Amazon); Samridhi Choudhary (Amazon)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

This paper describes Distill-Quantize-Tune (DQT), a pipeline to create viable small-footprint multilingual models that can perform NLU directly on extremely resource-constrained Edge devices. We distill semantic knowledge from a large-sized teacher (transformer-based), that has been trained on huge amount of public and private data, into our Edge candidate (student) model (Bi-LSTM based) and further compress the student model using a lossy quantization method. We show that unlike monolingual models, in a multilingual scenario post-compression finetuning on downstream tasks is not enough to recover the performance loss caused by compression. We design a fine-tuning pipeline to recover the lost performance using a compounded loss function consisting of NLU, distillation and compression losses. We show that pre-biasing the encoder with semantics learned on a language modeling task can further improve the performance when used in conjunction with DQT pipeline. Our best performing multilingual model achieves a size reduction of 85% and 99.2% when compared to uncompressed student and teacher models respectively. It outperforms the uncompressed monolingual models (by >30% on average) across all languages on our in-house data. We further validate our approach and see similar trends on the public MultiATIS++ dataset.

Tags:

Machine learning methods for language

DISTILL-QUANTIZE-TUNE - LEVERAGING LARGE TEACHERS FOR LOW-FOOTPRINT EFFICIENT MULTILINGUAL NLU ON EDGE

Pegah Kharazmi (Amazon); Zhewei Zhao (Amazon); Clement Chung (Amazon); Samridhi Choudhary (Amazon)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models

Egocentric Action Anticipation for Personal Health

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Join the IEEE Signal Processing Society