Improved End-To-End Spoken Utterance Classification With A Self-Attention Acoustic Classifier

Ryan Price, Mahnoosh Mehrabani, Srinivas Bangalore

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:09

04 May 2020

While human language provides a natural interface for human-machine communication, there are several challenges concerning extracting the intents of a speaker when interacting with a virtual agent, especially when the speaker is in a noisy acoustic environment, that still remains to be solved. In this paper, we propose a new architecture for end-to-end spoken utterance classification (SUC) and also explore the impact of leveraging lexical information in conjunction with acoustic information obtained from the end-to-end model for SUC. We demonstrate that strong performance can be obtained by the model with acoustic features alone compared to a text classifier on ASR outputs. Furthermore, when acoustic and lexical embeddings from these classifiers are combined, accuracy that is on par with human agents can be achieved.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020