INTERMEDIATE FINE-TUNING USING IMPERFECT SYNTHETIC SPEECH FOR IMPROVING ELECTROLARYNGEAL SPEECH RECOGNITION

Lester Phillip G Violeta (Nagoya University); Ding Ma (Nagoya University); Wen-Chin Huang (Nagoya University); Tomoki Toda (Nagoya University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Research on automatic speech recognition (ASR) systems for electrolaryngeal speakers has been relatively unexplored due to small datasets. When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to overcome, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data. Despite the imperfect synthetic data, we show the effectiveness of this on electrolaryngeal speech datasets, with improvements of 6.1% over the baseline that did not use imperfect synthetic speech. Results show how the intermediate fine-tuning stage focuses on learning the high-level inherent features of the imperfect synthetic data rather than the low-level features such as intelligibility.

Tags:

New algorithms and approaches for speech recognition

INTERMEDIATE FINE-TUNING USING IMPERFECT SYNTHETIC SPEECH FOR IMPROVING ELECTROLARYNGEAL SPEECH RECOGNITION

Lester Phillip G Violeta (Nagoya University); Ding Ma (Nagoya University); Wen-Chin Huang (Nagoya University); Tomoki Toda (Nagoya University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Noise-aware target extension with self-distillation for robust speech recognition

PRACTICE OF THE CONFORMER ENHANCED AUDIO-VISUAL HUBERT ON MANDARIN AND ENGLISH

Join the IEEE Signal Processing Society