End-To-End Spoken Language Understanding Without Matched Language Speech Model Pretraining Data

Ryan Price

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:58

04 May 2020

In contrast to conventional approaches to spoken language understanding (SLU) that consist of cascading a speech recognizer with a natural language understanding component, end-to-end (E2E) approaches for SLU infer semantics directly from the speech signal without processing it through separate subsystems. Pretraining part of the E2E models for speech recognition before finetuning the entire model for the target SLU task has proven to be an effective method to address the increased data requirements of E2E SLU models. However, transcribed corpora in the target language and domain may not always be available for pretraining an E2E SLU model. This paper proposes two strategies to improve the performance of E2E SLU models in scenarios where transcribed data for pretraining in the target language is unavailable: multilingual pretraining with mismatched languages and data augmentation using SpecAugment. We demonstrate the effectiveness of these two methods for E2E SLU on two datasets, including one recently released publicly available dataset where we surpass the best previously published despite not using any matched language data for pretraining.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

End-To-End Spoken Language Understanding Without Matched Language Speech Model Pretraining Data

Ryan Price

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society