Skip to main content

SLBERT: A NOVEL PRE-TRAINING FRAMEWORK FOR JOINT SPEECH AND LANGUAGE MODELING

Onkar Susladkar (Natter Labs); Prajwal Gatti (Dayananda Sagar College of Engineering); Santosh Kumar Yadav (Natter Labs)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

We propose Speech and Language pre-training framework for BERT an end-to-end trainable framework for learning joint representations of speech and language modalities. We enhance the well-known BERT architecture to provide a dual-stream multimodal architecture that processes both speech and language input. To enable effective information exchange between the two modalities, we introduce a novel attention fusion mechanism via AF-Blocks. To acquire robust contrastive representations for speech and language processing applications, we pre-train on three auxiliary tasks: Masked Language Modeling, Masked Speech Modeling, and Speech-Language Matching. We evaluate our proposed model on two well-known multimodal tasks: intent classification and sentiment analysis. Our model achieves state-of-the-art results on both benchmarks while surpassing even larger baselines.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00