A Reality Check and A Practical Baseline for Semantic Speech Embedding
Guangyu Chen (Renmin University of China); Yuanyuan Cao (Renmin University of China)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Generating spoken word embeddings that possess semantic information has attracted lots of research interest. Among them, Speech2vec, as one of the most influential works, has reported impressive results of surpassing Word2Vec on word similarity benchmarks. However, since their breakthrough in 2017, this field seems to have stalled. There are no subsequent comparisons, successors, and even successful replications. We think Speech2vec may be overestimated since intrinsic interferences exist between phonetics and semantics, preventing the model from learning effective semantic embeddings. In this study, we first examined the authenticity of Speech2Vec. Proofs on embedding properties and vocabulary compositions suggested that their claimed results may be wrongly produced by a text-based model. In addition, we reproduced the Speech2Vec model and reported the replicable results to set a practical baseline for future developments. Our codes and data are available.