TEMPORAL KNOWLEDGE DISTILLATION FOR ON-DEVICE AUDIO CLASSIFICATION

Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:56

10 May 2022

Improving the performance of on-device audio classification models remains a challenge given the computational limits of the mobile environment. Many studies leverage knowledge distillation to boost predictive performance by transferring the knowledge from large models to on-device models. However, most lack a mechanism to distill the essence of the temporal information, which is crucial to audio classification tasks, or similar architecture is often required. In this paper, we propose a new knowledge distillation method designed to incorporate the temporal knowledge embedded in attention weights of large transformer-based models into on-device models. Our distillation method is applicable to various types of architectures, including the non-attention-based architectures such as CNNs or RNNs, while retaining the original network architecture during inference. Through extensive experiments on both an audio event detection dataset and a noisy keyword spotting dataset, we show that our proposed method improves the predictive performance across diverse on-device architectures.

Tags:

knowledge distillation

transformer

on-device

audio classification

TEMPORAL KNOWLEDGE DISTILLATION FOR ON-DEVICE AUDIO CLASSIFICATION

Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

Slides: Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

CONTENT-ADAPTIVE PARALLEL ENTROPY CODING FOR END-TO-END IMAGE COMPRESSION

Join the IEEE Signal Processing Society