Skip to main content

Pre-Training In Deep Reinforcement Learning For Automatic Speech Recognition

Thejan Rajapakshe, Rajib Rana, Siddique Latif, Sara Khalifa, Björn Schuller

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 05:48
04 May 2020

Deep reinforcement learning (deep RL) is a combination of deep learning and reinforcement learning principles. it creates efficient methods that can learn by interacting with its environment. Deep RL led to breakthroughs in many complex tasks that were previously difficult to solve. However, deep RL requires a large amount of training time that makes it difficult to use in various real-life applications like human-computer interaction (HCI). Therefore, in this work, pre-training in deep RL has been studied to reduce the training time and improve the performance in speech recognition. Implementation of the policy model in RL was done using the well-known Tensorflow python library. It includes a combination of Convolutional Neural Network (CNN) Layers and Long-Short Term Memory (LSTM) layers. The use of CNN and LSTM indulges us the ability to increase the performance for speech recognition tasks. We considered RL in playing a guessing game of speech commands. After considering the ground truth by the environment, a reward is given. For a correct guessing a positive reward, and for a false guessing a negative reward is given. The study was carried out by using the publicly available Speech Commands Dataset. Initially, the policy model was pre-trained in a separate sub-set of the selected dataset before RL starts. The REINFORCE algorithm was used to approximate the policy gradient. The average score was compared between "with" and "without" pre-training paradigms. The score after 10,000 episodes with pre-training has increased by about 50% in higher class classifiers. Results show that pre-training helps to achieve considerably better results in a lower number of episodes. Moreover, the velocity of the score has increased in "with pre-training" rather than in "without pre-training" experiments. In the proposed model, it uses pre-training knowledge to achieve a better score while reducing the convergence time.