Gated contextual adapters for selective contextual biasing in neural transducers
Anastasios Alexandridis (Amazon.com); Kanthashree Mysore Sathyendra (Amazon); Grant Strimel (Amazon.com); Feng-Ju Chang (Amazon); Ariya Rastrow (Amazon Alexa); Nathan Susanj (Amazon.com); Athanasios Mouchtaris (Amazon Alexa)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Neural contextual biasing for end-to-end neural ASR transducers has shown significant improvements in the recognition of named entities, such as contact names or device names. However, it comes with the cost of increased compute, as the biasing layers (which are usually based on cross-attention) add complexity to the neural transducers. In this paper, we propose gated contextual biasing models that can estimate at runtime when contextual biasing is needed and can toggle it on or off. That way, contextual biasing does not run on every audio frame, but only on the frames where it can be helpful for correct ASR recognition. We show that our gated contextual biasing models can maintain all the performance improvements of contextual biasing while offering significant compute-cost saving, as the contextual biasing needs to be executed for fewer than 15% of the audio frames.