Relative dynamic time warping comparison for pronunciation errors
Caitlin Richter (Reykjavik University); Jon Gudnason (Reykjavik University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We propose using a dynamic time warping (DTW) difference-to-sum ratio to classify speech as either matching or diverging from a linguistic standard. This measure effectively recognises non-native Norwegian speakers' mispronunciations in words and phonetic segments. The contributions of the approach include (a) using DTW comparisons from two parallel sources, which represent the linguistic standard (e.g. native speakers) and an error model, to identify pronunciation errors; (b) recognising a heterogeneous standard, in this case the highly variable range of Norwegian dialects, instead of only a specified canonical phoneme sequence; (c) handling unanticipated pronunciation variants, both acceptable and unacceptable, beyond those seen in the standard and error models; and (d) requiring minimal training or pretraining data in the target language, which helps to make pronunciation error detection accessible even in low-resource languages without functional ASR.