Skip to main content

SLOT-TRIGGERED CONTEXTUAL BIASING FOR PERSONALIZED SPEECH RECOGNITION USING NEURAL TRANSDUCERS

Sibo Tong (Amazon); Philip Harding (Amazon Alexa); Simon Wiesler (Amazon)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

End-to-end (E2E) automatic speech recognition (ASR) models have been found to perform well on general transcription tasks but often fail to correctly recognize words that occur infrequently in the training data. Personalization is important for a variety of tasks, including virtual assistants where recall of infrequently observed words such as contact names, song titles and place names is critical. In these cases contextual information is often available which can be used to bias the E2E ASR model. Contextual biasing (CB) has been shown to be effective for this task, however most existing work focuses on biasing for a single domain and so in this work we focus on the application of biasing to multiple domains. We propose a method whereby the E2E ASR model is trained to emit opening and closing tags around slot content which are used to both selectively enable biasing and decide which catalog to use for biasing. Our method is shown to not only efficiently scale to multiple slots, but also further improves accuracy on slot content.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00