The Sound Of My Voice: Speaker Representation Loss For Target Voice Separation

Seongkyu Mun, Soyeon Choe, Jaesung Huh, Joon Son Chung

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:30

04 May 2020

Content and style representations have been widely studied in the field of style transfer. In this paper, we propose a new loss function using speaker content representation for audio source separation, and we call it speaker representation loss. The objective is to extract the target speaker voice from the noisy input and also remove it from the residual components. Compared to the conventional spectral reconstruction, our proposed framework maximizes the use of target speaker information by minimizing the distance between the speaker representations of reference and source separation output. We also propose triplet speaker representation loss as an additional criterion to remove the target speaker information from residual spectrogram output. VoiceFilter framework is adopted to evaluate source separation performance using the VCTK database, and we achieved improved performances compared to the baseline loss function without any additional network parameters.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

The Sound Of My Voice: Speaker Representation Loss For Target Voice Separation

Seongkyu Mun, Soyeon Choe, Jaesung Huh, Joon Son Chung

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society