-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:46
Deep learning based approaches have greatly improved the performance of spoken keyword spotting (KWS). However, KWS of different languages should have their own corresponding modeling units to optimize the performance. In this paper, we propose an end-to-end Mandarin KWS system using Convolutional Recurrent Neural Network with the Connectionist Temporal Classification (CTC) loss function (CRNN-CTC). The tonal syllables are adopted as modeling units. Experiments on AISHELL-2 datasets showed that the proposed approach on the tasks of 13 keywords and 20 keywords can achieve a false rejection rate of 5.35% with 0.26 FA/hour and 6.37% with 0.17 FA/hour, respectively.