Choir Singing Synthesis For Rehearsal Tools With Large-Scale Multilingual Repertoires
Jordi Janer, Ãlvaro Sarasúa, Oscar Mayor, Jordi Bonada, Merlijn Blaauw
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:45
Advances in the quality of recent Text-to-Speech Synthesis (TTS) based on Deep Learning (DL) made a large impact for its wide application range in industry and simultaneous social awareness. In the music domain, Singing Voice Synthesis (SVS) profited from similar advances in realism and expressive control. The SVS approaches take into account musical constraints, and specific datasets need to be recorded and annotated. Although virtual singers are not new, today DL-based systems that generate synthetic media are gaining attention in academia, industry as well as in media production. One of the renowned research labs working on Singing Technologies is the MTG-UPF in Barcelona. Together with its spin-off company Voctro Labs, they have developed a Choir Singing Synthesis system that models a professional choir of 16 singers in multiple languages. The system combines a novel work on F0-modelling introduced at ICASSP 2020, with the Neural Parametric Singing Synthesis (NPSS) algorithm developed by coauthors. It takes the notes and corresponding lyrics as input, and generates a synthetic output for different voices. It includes a Hybrid DNN-Parametric F0-model trained on studio recordings by singers in four registers (soprano, alto, tenor and bass), capturing the specific intonation patterns. Timbre is also modelled independently for the different four registers, allowing to synthesize realistic replicas of each voice type and pitch range. A multilingual dataset was recorded in four languages, thanks to the diction proficiency of professional singers in Western choral music repertoire. This demo shows an interactive web-based prototype. Attendants will have the opportunity to listen to the realistic synthetic choir (e.g. https://soundcloud.com/phonos-upf/demo-2), as well as to practice their singing skills on the interactive rehearsal tool (https://trompa.netlify.com/). The work presented is partially funded by the TROMPA EU Project (Grant No 770376).