Skip to main content

DISTILL-QUANTIZE-TUNE - LEVERAGING LARGE TEACHERS FOR LOW-FOOTPRINT EFFICIENT MULTILINGUAL NLU ON EDGE

Pegah Kharazmi (Amazon); Zhewei Zhao (Amazon); Clement Chung (Amazon); Samridhi Choudhary (Amazon)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

This paper describes Distill-Quantize-Tune (DQT), a pipeline to create viable small-footprint multilingual models that can perform NLU directly on extremely resource-constrained Edge devices. We distill semantic knowledge from a large-sized teacher (transformer-based), that has been trained on huge amount of public and private data, into our Edge candidate (student) model (Bi-LSTM based) and further compress the student model using a lossy quantization method. We show that unlike monolingual models, in a multilingual scenario post-compression finetuning on downstream tasks is not enough to recover the performance loss caused by compression. We design a fine-tuning pipeline to recover the lost performance using a compounded loss function consisting of NLU, distillation and compression losses. We show that pre-biasing the encoder with semantics learned on a language modeling task can further improve the performance when used in conjunction with DQT pipeline. Our best performing multilingual model achieves a size reduction of 85% and 99.2% when compared to uncompressed student and teacher models respectively. It outperforms the uncompressed monolingual models (by >30% on average) across all languages on our in-house data. We further validate our approach and see similar trends on the public MultiATIS++ dataset.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00