Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

Takanori Ashihara (NTT Corp.); Takafumi Moriya (NTT); Kohei Matsuura (NTT); Tomohiro Tanaka (NTT Corp.)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Self-supervised learning (SSL) has been dramatically successful not only in monolingual but also in cross-lingual settings. However, since the two settings have been studied individually in general, there has been little research focusing on how effective a cross-lingual model is in comparison with a monolingual model. In this paper, we investigate this fundamental question empirically with Japanese automatic speech recognition (ASR) tasks. First, we begin by comparing the ASR performance of cross-lingual and monolingual models for two different language tasks while keeping the acoustic domain as identical as possible. Then, we examine how much unlabeled data collected in Japanese is needed to achieve performance comparable to a cross-lingual model pre-trained with tens of thousands of hours of English and/or multilingual data. Finally, we extensively investigate the effectiveness of SSL in Japanese and demonstrate state-of-the-art performance on multiple ASR tasks. Since there is no comprehensive SSL study for Japanese, we hope this study will guide Japanese SSL research.

Tags:

Self-supervised and semi-supervised learning

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

Takanori Ashihara (NTT Corp.); Takafumi Moriya (NTT); Kohei Matsuura (NTT); Tomohiro Tanaka (NTT Corp.)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

ACTIVE LEARNING FOR EFFICIENT FEW-SHOT CLASSIFICATION

Learning on Graphs under Label Noise

HINDI AS A SECOND LANGUAGE: IMPROVING VISUALLY GROUNDED SPEECH WITH SEMANTICALLY SIMILAR SAMPLES

Join the IEEE Signal Processing Society