Real-Time Sound Event Detection On The Edge: Porting Vggish On Low-Power Iot Microcontrollers

Gianmarco Cerutti, Alessio Brutti, Elisabetta Farella

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:42

04 May 2020

Internet of Things (IoT) applications typically require a large number of heterogeneous devices to be distributed in the environment, which can generate large amounts of data for wireless transmission, affecting the energy requirements and lifetime of the devices. One strategy is computing on the very edge: performing advanced processing directly on the IoT node reduces the amount of transmitted data and the related power consumption. Thanks to recent improvements in embedded technology, commercial microcontrollers with consumptions in the range of mW and computational power to enable Artificial Intelligence (AI) at the ``thing-levelâ are available. Sound event detection (SED) is an example of an emerging IoT-based application, driven by a growing interest in sensing technologies for smart cities. The recent release of new datasets and challenges (UrbanSound8K, AudioSet, ESC50, and DCASE) has led to substantial advances in terms of accuracy and robustness. Unfortunately, state-of-the-art algorithms employ very large neural networks, which are increasingly hungry of computational power and memory, preventing the development of applications for energy-neutral and cheap IoT devices. This show&tell presents our implementation of state of the art SED at the very edge, by optimizing deep learning techniques on very low-cost low-power embedded platforms, with severe constraints in terms of memory footprint and computational power. Using a student-teacher approach we make a state of the art neural network for sound event detection (based on VGGish) fit on current commercial microcontrollers by achieving extreme compression factors (from 70 millions to 20 thousand parameters). We implement our model on an ARM Cortex M4 using the CMSIS-NN library and adopting an efficient layer-wise 8-bit quantization of buffers and weights. Our real-time embedded implementation achieves 68% accuracy on Urbansound8k, with an inference time of 125 ms for each second of audio and a power consumption of 5.5 mW in just 34.3 kB of RAM.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Real-Time Sound Event Detection On The Edge: Porting Vggish On Low-Power Iot Microcontrollers

Gianmarco Cerutti, Alessio Brutti, Elisabetta Farella

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society