Prosody is Not Identity: A Speaker Anonymization Approach Using Prosody Cloning
Sarina Meyer (University of Stuttgart); Florian Lux (University of Stuttgart); Julia Koch (University of Stuttgart); Pavel Denisov (University of Stuttgart); Pascal Tilli (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Prosody is closely linked to the identity of a speaker, leading to individual pitch and intonation patterns. Therefore, it is challenging in speaker anonymization to generate speech utterances that both keep the original audio's main prosodic structure and preserve the speaker's privacy. In this paper, we present a system that extends a speech-to-text-to-speech anonymization pipeline with prosody cloning and show how to control the cloning by multiplying pitch and energy sequences with random offset values. Using automatic and human evaluation, we find this combination to successfully overcome the privacy-utility trade-off for prosody by achieving high privacy and high pitch correlation scores. At the same time, the anonymized utterances prove to reproduce the original voice distinctiveness and content with high intelligibility and only a small loss in naturalness, making them suitable for downstream applications.