Skip to main content

Leveraging Language Embeddings for Cross-lingual Self-supervised Speech Representation Learning

Tomohiro Tanaka (NTT); Ryo Masumura (NTT Corporation); Mana Ihori (NTT); Hiroshi Sato (NTT Corporation); Taiga Yamane (NTT); Takanori Ashihara (NTT Corp.); Kohei Matsuura (NTT); Takafumi Moriya (NTT)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

In this paper, we propose novel cross-lingual self-supervised speech representation learning methods that explicitly consider language information. Cross-lingual self-supervised speech representation learning has been studied to make effective use of diverse data in various languages. Previous methods train models from multilingual datasets without taking language into account. However, it is difficult to train speech representations from multilingual datasets in the same space without language specification since there are clear differences in the acoustic context between languages. To solve this problem, we propose leveraging language IDs to build self-supervised speech representation learning models that explicitly consider language information. Our proposed models utilize fixed-dimensional language embeddings converted from language IDs for the model learning the relationship between related speech representations in different languages. We investigate two strategies to introduce language embeddings into the models: adding the embeddings to all of the inputs and concatenating to the inputs of the Transformer. We experimentally investigated how the difference between the two strategies affects the downstream tasks. Experimental results on the English and Japanese datasets show that the proposed methods improve the accuracies of downstream automatic speech recognition tasks.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00