Detecting Multiple Speech Disfluencies Using A Deep Residual Network With Bidirectional Long Short-Term Memory

Tedd Kourkounakis, Amirhossein Hajavi, Ali Etemad

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:41

04 May 2020

Stuttering is a speech impediment affecting tens of millions of people on an everyday basis. Even with its commonality, there is minimal data and research on the identification and classification of stuttered speech. This paper tackles the problem of detection and classification of different forms of stutter. As opposed to most existing works that identify stutters with language models, our work proposes a model that relies solely on acoustic features, allowing for identification of several variations of stutter disfluencies without the need for speech recognition. Our model uses a deep residual network and bidirectional long short-term memory layers to classify different types of stutters and achieves an average miss rate of 10.03%, outperforming the state-of-the-art by almost 27%.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Detecting Multiple Speech Disfluencies Using A Deep Residual Network With Bidirectional Long Short-Term Memory

Tedd Kourkounakis, Amirhossein Hajavi, Ali Etemad

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society