A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

Siddhant Arora (Carnegie Mellon University); Hayato Futami (Sony Group Corporation); Shih-Lun Wu (Carnegie Mellon University); Jessica Huynh (Carnegie Mellon University); Yifan Peng (Carnegie Mellon University); Yosuke Kashiwagi (Sony); Emiru Tsunoo (Sony Group Corporation); Brian Yan (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

10 Jun 2023

Recently there have been efforts to introduce new benchmark tasks for spoken language understanding (SLU), like semantic parsing. In this paper, we describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge which is part of ICASSP Signal Processing Grand Challenge 2023. We experiment with both end-to-end and pipeline systems for this task. Strong automatic speech recognition (ASR) models like Whisper and pretrained Language models (LM) like BART are utilized inside our SLU framework to boost performance. We also investigate the output level combination of various models to get an exact match accuracy of 80.8, which won the 1st place at the challenge.

Tags:

Signal Processing for Communications and Networking

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

A Progressive Neural Network for Acoustic Echo Cancellation

Gesper: A Unified Framework for General Speech Restoration

Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech Stimulus and EEG Response

Join the IEEE Signal Processing Society