Joint unsupervised and supervised learning for context-aware language identification

Jinseok Park (42dot); Hyung Yong Kim (42dot); Jihwan Park (42dot Inc.); Byeong-Yeol Kim (42dot); Shukjae Choi (Hyundai Motor Company); Yunkyu Lim (42dot)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. The proposed method learns the context of speech through masked language modeling (MLM) loss and simultaneously trains to determine the language of the utterance with supervised learning loss. The proposed joint learning was found to reduce the error rate by 15.6% compared to the same structure model trained by supervised-only learning on a subset of the VoxLingua107 dataset consisting of sub-three-second utterances in 11 languages.

Tags:

Word spotting, VAD, and other topics in speech recognition

Joint unsupervised and supervised learning for context-aware language identification

Jinseok Park (42dot); Hyung Yong Kim (42dot); Jihwan Park (42dot Inc.); Byeong-Yeol Kim (42dot); Shukjae Choi (Hyundai Motor Company); Yunkyu Lim (42dot)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

FEDERATED LEARNING FOR ASR BASED ON WAV2VEC 2.0

The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

Join the IEEE Signal Processing Society