Emotional Speech Synthesis With Rich And Granularized Control
Se-Yun Um, Sangshin Oh, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang, Kyungguen Byun
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:58
This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to well determine representative embedding vectors to the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize a distance to the target emotion category while maximizing its distance to the other emotion categories. To further enhance the expressiveness of a target speech, we also introduce an effective interpolation technique that enables to gradually change the probability density function of target emotion to that of neutral speech. Subjective evaluation results in terms of emotional expressiveness and control-ability show the superiority of the proposed algorithm to the conventional methods.