Dynamically Weighted Ensemble Models For Automatic Speech Recognition
Kiran Praveen, Abhishek Pandey, Deepak Kumar, Shakti Prasad Rath, Sandip Bapat
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 0:13:04
In machine learning, training multiple models for the same task, and using the outputs from all the models helps reduce the variance of the combined result. Using an ensemble of models in classification tasks such as Automatic Speech Recognition (ASR) improves the accuracy across different target domains such as multiple accents, environmental conditions, and other scenarios. It is possible to select model weights for the ensemble in numerous ways. A classifier trained to identify target domain, a simple averaging function, or an exhaustive grid search are the common approaches to obtain suitable weights. All these methods suffer either in choosing sub-optimal weights or by being computationally expensive. We propose a novel and practical method for dynamic weight selection in an ensemble, which can approximate a grid search in a time-efficient manner. We show that a combination of weights always performs better than assigning uniform weights for all models. Our algorithm can utilize a validation set if available or find weights dynamically from the input utterance itself. Experiments conducted for various ASR tasks show that the proposed method outperforms the uniformly weighted ensemble in terms of Word Error Rate (WER) in our experiments.