The Thinkit System For ICASSP2021 M2Voc Challenge

Zengqiang Shang, Haozhe Zhang, Ziyi Chen, Bolin Zhou, Pengyuan Zhang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:24

07 Jun 2021

In this paper, we introduce the low resource text-to-speech system from the ThinkIT team submitted to Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC). The challenge has two tasks: few-shot track1 provides 100 samples for each person and one-shot track2 offers 5 samples only. Each track contains two sub-tracks A and B. Instead of sub-track A, sub-track B can use extra public data besides the released data. But we participate in the sub-track A only. We choose the finetune as our backbone strategy. Our submitted systems include BERT based prosody boundary prediction module, FastSpeech based acoustic model to generate acoustic features from text input, and HIFIGAN based vocoder to generate waveform from acoustic features. Among them, acoustic models are susceptible to low resource speakers. To prevent over-fitting, we modified the acoustic model and split out validation set to assist the manual model selection. Evaluation results provided by the challenges organizers demonstrate the effectiveness of our system.

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

The Thinkit System For ICASSP2021 M2Voc Challenge

Zengqiang Shang, Haozhe Zhang, Ziyi Chen, Bolin Zhou, Pengyuan Zhang

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Keynote: Innovating for Product Sustainability – Making Data Centers Greener

Panel: Navigating Green: Regulatory Insights and Compliance Strategies for Building a Sustainable Future

Sustainability Start-up Pitch Competition

Join the IEEE Signal Processing Society