Skip to main content

Relative dynamic time warping comparison for pronunciation errors

Caitlin Richter (Reykjavik University); Jon Gudnason (Reykjavik University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
09 Jun 2023

We propose using a dynamic time warping (DTW) difference-to-sum ratio to classify speech as either matching or diverging from a linguistic standard. This measure effectively recognises non-native Norwegian speakers' mispronunciations in words and phonetic segments. The contributions of the approach include (a) using DTW comparisons from two parallel sources, which represent the linguistic standard (e.g. native speakers) and an error model, to identify pronunciation errors; (b) recognising a heterogeneous standard, in this case the highly variable range of Norwegian dialects, instead of only a specified canonical phoneme sequence; (c) handling unanticipated pronunciation variants, both acceptable and unacceptable, beyond those seen in the standard and error models; and (d) requiring minimal training or pretraining data in the target language, which helps to make pronunciation error detection accessible even in low-resource languages without functional ASR.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00