CTCBERT: ADVANCING HIDDEN-UNIT BERT WITH CTC OBJECTIVES

Ruchao Fan (University of California, Los Angeles); Yiming Wang (Microsoft Corporation); Yashesh Gaur (Microsoft); Jinyu Li (Microsoft)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

In this work, we present a simple but effective method, CTCBERT, for advancing hidden-unit BERT (HuBERT). HuBERT applies a frame-level cross-entropy (CE) loss, which is similar to most acoustic model training. However, CTCBERT performs the model training with the Connectionist Temporal Classification (CTC) objective after removing duplicated IDs in each masked region. The idea stems from the observation that there can be significant errors in alignments when using clustered or aligned IDs. CTC learns alignments implicitly, indicating that learning with CTC can be more flexible when misalignment exists. We examine CTCBERT on IDs from HuBERT Iter1, HuBERT Iter2, and PBERT. The CTC training brings consistent improvements compared to the CE training. Furthermore, when loading blank-related parameters during finetuning, slight improvements are observed. Evaluated on the Librispeech 960-100h setting, the relative WER improvements of CTCBERT are 2%-11% over HuBERT and PERT on test-other data.

Tags:

Large vocabulary continuous speech recognition/search

CTCBERT: ADVANCING HIDDEN-UNIT BERT WITH CTC OBJECTIVES

Ruchao Fan (University of California, Los Angeles); Yiming Wang (Microsoft Corporation); Yashesh Gaur (Microsoft); Jinyu Li (Microsoft)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

ROBUST ACOUSTIC AND SEMANTIC CONTEXTUAL BIASING IN NEURAL TRANSDUCERS FOR SPEECH RECOGNITION

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages

Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax

Join the IEEE Signal Processing Society