Raw Ultrasound-based Phonetic Segments Classification Via Mask Modeling

kang you (Shanghai Jiao Tong University); Bo Liu (National University of Defense Technology); Kele Xu (National Key Laboratory of Parallel and Distributed Processing (PDL)); Yunsheng Xiong (National University of Defense Technology); Qisheng Xu (National University of Defense Technology); Ming Feng (Tongji University); Tamás G Csapó (Budapest University of Technology and Economics); Boqing Zhu (National University of Defense Technology)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Ultrasound tongue imaging (UTI) is widely used in clinical linguistics and phonetics. Recently, deep neural networks, especially convolutional neural networks, have been widely used in the interpretation and analysis of ultrasound tongue images. Despite achieving satisfactory performance, the method relies on a large amount of manually labeled data, which is often difficult to obtain in practical settings. To address this issue, this paper focuses on how to utilize a large amount of unlabeled UTI data to improve the performance of UTI classification task. Specifically, we explore self-supervised learning with masking strategies. By predicting the masked part, our pre-trained part enables the neural network to infer contextual information. Then, we fine-tune the pre-trained model with a small amount of labeled data. Compared with the previous competing algorithms, our method can improve the classification accuracy by an average of 13.33% in four different scenarios.

Tags:

Speech production, perception and psychoacoustics

Raw Ultrasound-based Phonetic Segments Classification Via Mask Modeling

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Summary on the Multimodal Information based Speech Processing (MISP) 2022 Challenge

Auditory EEG Decoding Challenge

Spoken Language Understanding Grand Challenge

Join the IEEE Signal Processing Society