Skip to main content

A Reality Check and A Practical Baseline for Semantic Speech Embedding

Guangyu Chen (Renmin University of China); Yuanyuan Cao (Renmin University of China)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

Generating spoken word embeddings that possess semantic information has attracted lots of research interest. Among them, Speech2vec, as one of the most influential works, has reported impressive results of surpassing Word2Vec on word similarity benchmarks. However, since their breakthrough in 2017, this field seems to have stalled. There are no subsequent comparisons, successors, and even successful replications. We think Speech2vec may be overestimated since intrinsic interferences exist between phonetics and semantics, preventing the model from learning effective semantic embeddings. In this study, we first examined the authenticity of Speech2Vec. Proofs on embedding properties and vocabulary compositions suggested that their claimed results may be wrongly produced by a text-based model. In addition, we reproduced the Speech2Vec model and reported the replicable results to set a practical baseline for future developments. Our codes and data are available.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00