Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 17:35
04 May 2020

A mispronunciation detection and diagnosis (MD&D) system typically consists of multiple stages, such as an acoustic model, a language model and a Viterbi decoder. In order to integrate these stages, we propose SED-MDD, an end-to-end model for sentence dependent mispronunciation detection and diagnosis (MD&D) . Our proposed model takes mel-spectrogram and characters as inputs and outputs the corresponding phone sequence. Our experiments prove that SED-MDD can implicitly learn the phonological rules in both acoustic and linguistic features directly from the phonological annotation and transcription in the training data. To the best of our knowledge, SED-MDD is the first model of its kind and it achieves an accuracy of 86.35% and a correctness of 88.61% on L2-ARCTIC which significantly outperforms the existing end-to-end mispronunciation detection and diagnosis (MD&D) model CNN-RNN-CTC.