Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:08:00
13 May 2022

Ultrasound tongue imaging is an attractive way for speech production study as it provides an effective visualization for the vocal tract. Automatic classification of phonetic segments (tongue shapes) from raw ultrasound data is vital for further interpretation. Recently, deep learning-based approaches have been adopted in this task, which required a large-scale annotated dataset for the training, and it is not easy to be obtained. Moreover, the data may contain many hard examples for the classification task, due to contamination of speckle noise. In this paper, we aim to address these issues: firstly, self-supervised learning is adopted to utilize the unlabeled datasets and extract the features without any human annotations; secondly, hard example mining is applied to imitate the learning path of the clinical linguists. To empirically demonstrate the proposed method's effectiveness, we evaluate the method on the Ultrax Typically Developing dataset (UXTD) under different scenarios. The results show that the proposed method outperforms the other methods and achieves superior performance.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00