Enhancing Unsupervised Speech Recognition with Diffusion GANs

Xianchao Wu (NVIDIA Japan)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusion-GAN. Our model (1) injects instance noises of various intensities to the generator's output and unlabeled reference text which are sampled from pretrained phoneme language models with a length constraint, (2) asks diffusion timestep-dependent discriminators to separate them, and (3) back-propagates the gradients to update the generator. Word/phoneme error rate comparisons with wav2vec-U under Librispeech (3.1% for test-clean and 5.6% for test-other), TIMIT and MLS datasets, show that our enhancement strategies work effectively.

Tags:

language modeling

Value-Added Bundle(s) Including this Product

26 Apr 2024

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

16 Dec 2023

Large-Scale and Parameter-Efficient Language Modeling for Speech Processing

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

HAG: Hierarchical Attention with Graph Network for Dialogue Act Classification in Conversation

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

LEARNING TO BUILD REASONING CHAINS BY RELIABLE PATH RETRIEVAL

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00