SYNTACC : Synthesizing multi-accent speech by weight factorization
Tuan-Nam Nguyen (Karlsruhe Institute of Technology); Quan Pham (Karlsruhe Institute of Technology); Alexander Waibel (Karlsruhe Institute of Technology (KIT))
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Conventional multi-speaker text-to-speech synthesis (TTS) is known to be capable of synthesizing speech for multiple voices, yet it cannot generate speech in different accents. This limitation has motivated us to develop SYNTACC (Synthesizing speech with accents) which adapts conventional multi-speaker TTS to produce multi-accent speech. Our method uses the YourTTS model and involves a novel multi-accent training mechanism. The method works by decomposing each weight matrix into a shared component and an accent-dependent component, with the former being initialized by the pretrained multi-speaker TTS model and the latter being factorized into vectors using rank-1 matrices to reduce the number of training parameters per accent. This weight factorization method proves to be effective in fine-tuning the SYNTACC on multi-accent data sets in a low-resource condition. Our SYNTACC model eventually allows speech synthesis in not only different voices but also in different accents.