CONFORMER-BASED SPEECH RECOGNITION WITH LINEAR NYSTR™M ATTENTION AND ROTARY POSITION EMBEDDING

Lahiru Samarakoon, Tsun-Yat Leung

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:52

12 May 2022

Self-attention has become an important component for end-to-end (E2E) automatic speech recognition (ASR). Recently, Convolution-augmented Transformer (Conformer) with relative positional encoding (RPE) achieved state-of-the-art performance. However, the computational and memory complexity of self-attention grows quadratically with the input sequence length. Effect of this can be significant for the Conformer encoder when processing longer sequences. In this work, we propose to replace self-attention with a linear complexity Nystr�m attention which is a low-rank approximation of the attention scores based on the Nystr�m method. In addition, we propose to use Rotary Position Embedding (RoPE) with Nystr�m attention since RPE is of quadratic complexity. Moreover, we show that models can be made even lighter by removing self-attention sub-layers from top encoder layers without any drop in the performance. Furthermore, we demonstrate that Convolutional sub-layers in Conformer can effectively recover the information lost due to the Nystr�m approximation.

Tags:

nystrxf6mformer

conformer

end-to-end automatic speech recognition

rotary position embedding

CONFORMER-BASED SPEECH RECOGNITION WITH LINEAR NYSTR™M ATTENTION AND ROTARY POSITION EMBEDDING

Lahiru Samarakoon, Tsun-Yat Leung

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

NON-AUTOREGRESSIVE ASR WITH SELF-CONDITIONED FOLDED ENCODERS

CONFORMER-BASED SELF-SUPERVISED LEARNING FOR NON-SPEECH AUDIO TASKS

MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION

Join the IEEE Signal Processing Society