Improving The Performance Of Transformer Based Low Resource Speech Recognition For Indian Languages

Vishwas M. Shetty, Metilda Sagaya Mary N J, Srinivasan Umesh

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:05

04 May 2020

The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e.,(i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors. Language information to the acoustic vectors can be given in the form of one hot vector or by learning a language embedding. From our experiments, we observed that providing language identity always improved performance. The language embedding learned from our proposed approach, when added to the acoustic feature vector, gave the best result. The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Improving The Performance Of Transformer Based Low Resource Speech Recognition For Indian Languages

Vishwas M. Shetty, Metilda Sagaya Mary N J, Srinivasan Umesh

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society