AN ISOTROPY ANALYSIS FOR SELF-SUPERVISED ACOUSTIC UNIT EMBEDDINGS ON THE ZERO RESOURCE SPEECH CHALLENGE 2021 FRAMEWORK
Jianan Chen (Japan Advanced Institute of Science and Technology); Sakriani Sakti (Japan Advanced Institute of Science and Technology)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In recent years, self-supervised representation learning has gained much attention for its proven advantages in many downstream tasks. Consequently, various self-supervised representation learning methods have been developed. However, few studies have investigated the resulting embedding space or analyzed why any particular approach performs better than any other. Here, we are interested in investigating the geometry in terms of the isotropy of embedding spaces learned by self-supervised speech representation, which can influence the performance in discriminating acoustic units on the Zero Resource Speech Challenge 2021 (ZR2021) framework. Most top systems from the published ZR2021 results are based on the contrastive predictive coding (CPC) technique. Here, we propose using hidden-unit BERT (HuBERT) selfsupervised representation learning, and we provide detailed analyses and comparisons of their isotropies of embedding space, which might influence performance. Furthermore, we use simple yet effective feature fusion techniques to combine both models’ strengths, leading to the ability to reduce the ABX error rate and outperform top models in the ZR2021 dev-other dataset.