Role of Bias Terms in Dot-Product Attention

Mahdi Namazifar (Amazon Alexa AI); Devamanyu Hazarika (Amazon Alexa AI); Dilek Z Hakkani-Tur (Amazon Alexa AI)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Dot-product attention is a core module in current generation of neural network models, particularly transformers, and is being leveraged across numerous areas such as natural language processing and computer vision. This attention module is comprised of three linear transformations, namely query, key, and value linear transformations, each of which has a bias term. In this work, we study the role of these bias terms, and mathematically show that the bias term of the key linear transformation is redundant and could be omitted without any impact on the attention module. Moreover, we argue that the bias term of the value linear transformation has a more prominent role than that of the bias term of the query linear transformation. We empirically verify these findings through multiple experiments on language modeling, natural language understanding, and natural language generation tasks.

Tags:

language modeling

Role of Bias Terms in Dot-Product Attention

Mahdi Namazifar (Amazon Alexa AI); Devamanyu Hazarika (Amazon Alexa AI); Dilek Z Hakkani-Tur (Amazon Alexa AI)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Large-Scale and Parameter-Efficient Language Modeling for Speech Processing

HAG: Hierarchical Attention with Graph Network for Dialogue Act Classification in Conversation

Enhancing Unsupervised Speech Recognition with Diffusion GANs

Join the IEEE Signal Processing Society