E-BRANCHFORMER-BASED E2E SLU TOWARD STOP ON-DEVICE CHALLENGE
Yosuke Kashiwagi (Sony); Siddhant Arora (Carnegie Mellon University); Hayato Futami (Sony Group Corporation); Jessica Huynh (Carnegie Mellon University); Shih-Lun Wu (Carnegie Mellon University); Yifan Peng (Carnegie Mellon University); Brian Yan (Carnegie Mellon University); Emiru Tsunoo (Sony Group Corporation); Shinji Watanabe (Carnegie Mellon University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In this paper, we report our team's study on track 2 of the Spoken Language Understanding Grand Challenge, which is a component of the ICASSP Signal Processing Grand Challenge 2023. The task is intended for on-device processing and involves estimating semantic parse labels from speech using a model with 15 million parameters. We use E2E E-Branchformer-based spoken language understanding model, which is more parameter controllable than cascade models, and reduced the parameter size through sequential distillation and tensor decomposition techniques. On the STOP dataset, we achieved an exact match accuracy of 70.9% under the tight constraint of 15 million parameters.