Skip to main content

Emotional Speech Synthesis With Rich And Granularized Control

Se-Yun Um, Sangshin Oh, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang, Kyungguen Byun

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 11:58
04 May 2020

This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to well determine representative embedding vectors to the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize a distance to the target emotion category while maximizing its distance to the other emotion categories. To further enhance the expressiveness of a target speech, we also introduce an effective interpolation technique that enables to gradually change the probability density function of target emotion to that of neutral speech. Subjective evaluation results in terms of emotional expressiveness and control-ability show the superiority of the proposed algorithm to the conventional methods.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00