Relative dynamic time warping comparison for pronunciation errors

Caitlin Richter (Reykjavik University); Jon Gudnason (Reykjavik University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

We propose using a dynamic time warping (DTW) difference-to-sum ratio to classify speech as either matching or diverging from a linguistic standard. This measure effectively recognises non-native Norwegian speakers' mispronunciations in words and phonetic segments. The contributions of the approach include (a) using DTW comparisons from two parallel sources, which represent the linguistic standard (e.g. native speakers) and an error model, to identify pronunciation errors; (b) recognising a heterogeneous standard, in this case the highly variable range of Norwegian dialects, instead of only a specified canonical phoneme sequence; (c) handling unanticipated pronunciation variants, both acceptable and unacceptable, beyond those seen in the standard and error models; and (d) requiring minimal training or pretraining data in the target language, which helps to make pronunciation error detection accessible even in low-resource languages without functional ASR.

Tags:

Language acquisition and learning

Relative dynamic time warping comparison for pronunciation errors

Caitlin Richter (Reykjavik University); Jon Gudnason (Reykjavik University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems

Mutually Guided Few-shot Learning for Relational Triple Extraction

Phonetic RNN-Transducer for Mispronunciation Diagnosis

Join the IEEE Signal Processing Society