Prosody is Not Identity: A Speaker Anonymization Approach Using Prosody Cloning

Sarina Meyer (University of Stuttgart); Florian Lux (University of Stuttgart); Julia Koch (University of Stuttgart); Pavel Denisov (University of Stuttgart); Pascal Tilli (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Prosody is closely linked to the identity of a speaker, leading to individual pitch and intonation patterns. Therefore, it is challenging in speaker anonymization to generate speech utterances that both keep the original audio's main prosodic structure and preserve the speaker's privacy. In this paper, we present a system that extends a speech-to-text-to-speech anonymization pipeline with prosody cloning and show how to control the cloning by multiplying pitch and energy sequences with random offset values. Using automatic and human evaluation, we find this combination to successfully overcome the privacy-utility trade-off for prosody by achieving high privacy and high pitch correlation scores. At the same time, the anonymized utterances prove to reproduce the original voice distinctiveness and content with high intelligibility and only a small loss in naturalness, making them suitable for downstream applications.

Tags:

Anonymization and data privacy

Prosody is Not Identity: A Speaker Anonymization Approach Using Prosody Cloning

Sarina Meyer (University of Stuttgart); Florian Lux (University of Stuttgart); Julia Koch (University of Stuttgart); Pavel Denisov (University of Stuttgart); Pascal Tilli (University of Stuttgart); Ngoc Thang Vu (University of Stuttgart)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Prototype-Based Layered Federated Cross-Modal Hashing

FedPrompt: Communication-Efficient and Privacy-Preserving Prompt Tuning in Federated Learning

Quantum transfer learning using the large-scale unsupervised pre-trained model WavLM-Large for synthetic speech detection

Join the IEEE Signal Processing Society