NSV-TTS: NON-SPEECH VOCALIZATION MODELING AND TRANSFER IN EMOTIONAL TEXT-TO-SPEECH
Haitong Zhang (Netease Games AI Lab); Xinyuan Yu (Netease Games AI Lab); Yue Lin (NetEase Games AI Lab)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper addresses the problem of non-speech vocalization (NSV) modeling and transfer in emotional TTS. We propose an emotion TTS system (NSV-TTS) to model NSV and emotional speech. The model utilizes self-supervised learning to extract unsupervised linguistic units (ULUs) for NSV labeling and zero-shot NSV transfer. Furthermore, we propose token mixing and random masking to boost the performance. We evaluate the proposed method on various NSV types and emotion classes. The experimental results reveal that the proposed method performs well in the zero-shot NSV transfer task. Lastly, we conduct ablation studies to investigate the proposed method further.