IMPROVING END-TO-END MODELS FOR SET PREDICTION IN SPOKEN LANGUAGE UNDERSTANDING

Hong-Kwang Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:24

10 May 2022

The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts. Advances in end-to-end (E2E) speech modeling have made it possible to train solely on semantic entities, which are far cheaper to collect than verbatim transcripts. We focus on this set prediction problem, where entity order is unspecified. Using two classes of E2E models, RNN transducers and attention based encoder-decoders, we show that these models work best when the training entity sequence is arranged in spoken order. To improve E2E SLU models when entity spoken order is unknown, we propose a novel data augmentation technique along with an implicit attention based alignment method to infer the spoken order. F1 scores significantly increased by more than 11% for RNN-T and about 2% for attention based encoder-decoder SLU models, outperforming previously reported results.

Tags:

encoder-decoder

speech recognition

atis

attention

spoken language understanding

IMPROVING END-TO-END MODELS FOR SET PREDICTION IN SPOKEN LANGUAGE UNDERSTANDING

Hong-Kwang Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Tutorial: Foundational Problems in Neural Speech Recognition

Conversational Speech Processing and Recognition: Speech Separation, End-to-End Modeling, and Speaker Diarization

Cross-Inferential Networks for Source-free Unsupervised Domain Adaptation

Join the IEEE Signal Processing Society