A Reality Check and A Practical Baseline for Semantic Speech Embedding

Guangyu Chen (Renmin University of China); Yuanyuan Cao (Renmin University of China)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Generating spoken word embeddings that possess semantic information has attracted lots of research interest. Among them, Speech2vec, as one of the most influential works, has reported impressive results of surpassing Word2Vec on word similarity benchmarks. However, since their breakthrough in 2017, this field seems to have stalled. There are no subsequent comparisons, successors, and even successful replications. We think Speech2vec may be overestimated since intrinsic interferences exist between phonetics and semantics, preventing the model from learning effective semantic embeddings. In this study, we first examined the authenticity of Speech2Vec. Proofs on embedding properties and vocabulary compositions suggested that their claimed results may be wrongly produced by a text-based model. In addition, we reproduced the Speech2Vec model and reported the replicable results to set a practical baseline for future developments. Our codes and data are available.

Tags:

Acoustic modeling for automatic speech recognition

A Reality Check and A Practical Baseline for Semantic Speech Embedding

Guangyu Chen (Renmin University of China); Yuanyuan Cao (Renmin University of China)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Lattice-free Sequence Discriminative Training for Phoneme-based Neural Transducers

DELAY-PENALIZED TRANSDUCER FOR LOW-LATENCY STREAMING ASR

PREDICTING MULTI-CODEBOOK VECTOR QUANTIZATION INDEXES FOR KNOWLEDGE DISTILLATION

Join the IEEE Signal Processing Society