CACHING NETWORKS: CAPITALIZING ON COMMON SPEECH FOR ASR

Anastasios Alexandridis, Grant Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:07

13 May 2022

We introduce Caching Networks (CachingNets), a speech recognition network architecture capable of delivering faster, more accurate decoding by leveraging common speech patterns. By explicitly incorporating select sentences unique to each user into the network's design, we show how to train the model as an extension of the popular sequence transducer architecture through a multitask learning procedure. We further propose and experiment with different phrase caching policies, which are effective for virtual voice-assistant (VA) applications, to complement the architecture. Our results demonstrate that by pivoting between different inference strategies on the fly, CachingNets can deliver significant performance improvements. Specifically, on an industrial-scale, VA ASR task, we observe up to 7.4% relative word error rate (WER) and 11% sentence error rate (SER) improvements with accompanied latency gains.

Tags:

automatic speech recognition

streaming

latency

end-to-end

personalization

CACHING NETWORKS: CAPITALIZING ON COMMON SPEECH FOR ASR

Anastasios Alexandridis, Grant Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

End-to-End Automatic Speech Recognition

Towards a Speech Version of ChatGPT

Neural Signal Interpretation for Spoken Communication

Join the IEEE Signal Processing Society