Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

Amitay Sicherman (The Hebrew University of Jerusalem); Yossi Adi (Facebook AI Research )

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. First, we start comprehending these units by analyzing them in three axes: interpretation, visualization, and resynthesis. Our analysis finds a high correlation between the speech units to phonemes and phoneme families, while their correlation with speaker or gender is weaker. Additionally, we found redundancies in the extracted units and claim that one reason may be the units' context. Following this analysis, we propose a new, unsupervised metric to measure unit redundancies. Finally, we use this metric to develop new methods that improve the robustness of units' clustering and show significant improvement considering zero-resource speech metrics such as ABX. Code and analysis tools are available under the following link: https://github.com/slp-rl/SLM-Discrete-Representations.

Tags:

language modeling

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

Amitay Sicherman (The Hebrew University of Jerusalem); Yossi Adi (Facebook AI Research )

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Large-Scale and Parameter-Efficient Language Modeling for Speech Processing

HAG: Hierarchical Attention with Graph Network for Dialogue Act Classification in Conversation

Enhancing Unsupervised Speech Recognition with Diffusion GANs

Join the IEEE Signal Processing Society