Transformer In Action: A Comparative Study Of Transformer-Based Acoustic Models For Large Scale Speech Recognition Applications

Yongqiang Wang, Yangyang Shi, Frank Zhang, Chunyang Wu, Julian Chan, Ching-Feng Yeh, Alex Xiao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:07:56

10 Jun 2021

Transformer-based acoustic models have shown promising results very recently. In this paper, we summarize the application of transformer and its streamable variant, Emformer based acoustic model for large scale speech recognition applications. We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks. Specifically, we compare Emformer with latency-controlled BLSTM (LCBLSTM) on medium latency tasks and LSTM on low latency tasks. On a low latency voice assistant task, Emformer gets 24% to 26% relative word error rate reductions (WERRs). For medium latency scenarios, comparing with LCBLSTM with similar model size and latency, Emformer gets significant WERR across four languages in video captioning datasets with 2-3 times inference real-time factors reduction.

Chairs:

Xiaodong Cui

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Transformer In Action: A Comparative Study Of Transformer-Based Acoustic Models For Large Scale Speech Recognition Applications

Yongqiang Wang, Yangyang Shi, Frank Zhang, Chunyang Wu, Julian Chan, Ching-Feng Yeh, Alex Xiao

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Bundle: 2024 IEEE SustainTech Leadership Forum

Join the IEEE Signal Processing Society