Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 13:09
04 May 2020

While human language provides a natural interface for human-machine communication, there are several challenges concerning extracting the intents of a speaker when interacting with a virtual agent, especially when the speaker is in a noisy acoustic environment, that still remains to be solved. In this paper, we propose a new architecture for end-to-end spoken utterance classification (SUC) and also explore the impact of leveraging lexical information in conjunction with acoustic information obtained from the end-to-end model for SUC. We demonstrate that strong performance can be obtained by the model with acoustic features alone compared to a text classifier on ASR outputs. Furthermore, when acoustic and lexical embeddings from these classifiers are combined, accuracy that is on par with human agents can be achieved.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00