Industry Workshop: WS-2: The new era of all-neural SLU: opportunities and challenges ahead
Jennifer Shumway, Ariya Rastrow, Bj??rn Hoffmeister, Chris Ho
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 03:40:56
Speech recognition technology is completing a dramatic change �?? the move to an all-neural architecture replacing the conventional stack of independently trained neural and non-neural subsystems. The neural architecture improves accuracy over a wide range of use cases, challenges the boundary between speech recognition and language understanding allowing for jointly trained models, enables multi-task learning simultaneously solving transcription, segmentation, confidence estimation, and potentially more tasks. The neural architecture also achieves superior memory and compute compression, enabling streaming low-latency speech recognition at the edge, where resources are constrained. When applied as end-to-end all-neural SLU (ASR + NLU), the tradeoff between compression vs accuracy is even more favorable. The neural architecture enables truly multi-lingual systems that support within-sentence code switching. The neural architecture helps to reduce reliance on human labeling thanks to unsupervised pre-training, teacher/student semi-supervised training, and the ability to learn to incorporate user feedback signals, and to learn from other modalities. While the neural architecture has shown great results and provides leeway for significant future improvements, it also presents new challenges. Personalization and adaptation are much easier to do in the conventional factored stack by adapting the finite state language models, a property that is lost with end-to-end all-neural models. Making adaptation effective and practical for all-neural systems remains a challenge, one that requires focused innovation and investment on building new sophisticated neural architecture solutions. Rare-word modeling is a challenge for neural architectures which learn acoustics and language jointly from audio/text pairs, whereas conventional architectures can use much larger text-only data sets for training the language models. In this workshop, we will provide an overview of the all-neural architecture developed by the Alexa ASR group, dive deep into some of the challenges and future opportunities, and conduct a panel discussion and Q&A session on the impact, and the future of the all-neural approach to speech recognition.