Beyond Flops In Low-Rank Compression Of Neural Networks: Optimizing Device-Specific Inference Runtime

Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:00

22 Sep 2021

Neural network compression has become an important practical step when deploying trained models. We consider the problem of low-rank compression of the neural networks with the goal of optimizing the measured inference time. Given a neural network and a target device to run it, we want to find the matrix ranks and the weight values of the compressed model so that network runs as fast as possible on the device while having best task performance (e.g., classification accuracy). This is a hard optimization problem involving weights, ranks, and device constraints. To tackle this problem, we first implement a simple yet accurate model of the on-device runtime that requires only a few measurements. Then we give a suitable formulation of the optimization problem involving the proposed runtime model and solve it using alternating optimization. We validate our approach on various neural networks and show that by using our estimated runtime model we achieve better task performance compared to FLOPs based methods for the same runtime budget on the actual device.

Tags:

signal processing society

IEEE icip 2021

september 19-22

virtual conference

2021

sps

virtual conference icip 2021

icip 2021

Beyond Flops In Low-Rank Compression Of Neural Networks: Optimizing Device-Specific Inference Runtime

Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán

Value-Added Bundle(s) Including this Product

ICIP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Keynote: Innovating for Product Sustainability – Making Data Centers Greener

Panel: Navigating Green: Regulatory Insights and Compliance Strategies for Building a Sustainable Future

Sustainability Start-up Pitch Competition

Join the IEEE Signal Processing Society