Real-time Speech Interruption Analysis: From Cloud to Client Deployment

Quchen Fu (Vanderbilt University); Szu-Wei Fu (Microsoft Corporation); Yaran Fan (Microsoft Corporation); Yu Wu (Microsoft Research Asia); Zhuo Chen (Microsoft); Jayant Gupchup (Microsoft); Ross Cutler ( Microsoft Corporation)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic. One major issue with remote meetings is that it is challenging for remote participants to interrupt and speak. We have recently developed the first speech interruption analysis model WavLM_SI, which detects failed speech interruptions, shows very promising performance, and is being deployed in the cloud. To deliver this feature in a more cost-efficient and environment-friendly way, we reduced the model complexity and size to ship the WavLM_SI model in client devices. In this paper, we first describe how we successfully improved the True Positive Rate (TPR) at a 1% False Positive Rate (FPR) from 50.9% to 68.3% for the failed speech interruption detection model by training on a larger dataset and fine-tuning. We then shrank the model size from 222.7 MB to 9.3 MB with an acceptable loss in accuracy and reduced the complexity from 31.2 GMACS (Giga Multiply-Accumulate Operations per Second) to 4.3 GMACS. We also estimated the environmental impact of the complexity reduction, which can be used as a general guideline for large Transformer-based models, and thus make those models more accessible with less computation overhead.

Tags:

Speech emotion detection and analysis

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

Quchen Fu (Vanderbilt University); Szu-Wei Fu (Microsoft Corporation); Yaran Fan (Microsoft Corporation); Yu Wu (Microsoft Research Asia); Zhuo Chen (Microsoft); Jayant Gupchup (Microsoft); Ross Cutler ( Microsoft Corporation)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Emotion Recognition in Conversation from Variable-Length Context

Tranferring Quantified Emotion Knowledge for the Detection of Depression in Alzheimer's Disease Using ForestNets

DST: DEFORMABLE SPEECH TRANSFORMER FOR EMOTION RECOGNITION

Join the IEEE Signal Processing Society