Speech Time-Scale Modification With GANs

Eyal Cohen (Technion); Joseph Keshet (Technion - Israel Institute of Technology); Felix Kreuk (Bar-Ilan University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

While listening to spoken content, it is often desired to vary the speech rate while preserving the speaker’s timbre and pitch. To date, advanced signal processing techniques are used to address this task, but it still remains a challenge to maintain a high speech quality at all time-scales. Inspired by the success of speech generation using Generative Adversarial Networks (GANs), we propose a novel unsupervised learning algorithm for time-scale modification (TSM) of speech, called ScalerGAN. The model is trained using a set of speech utterances, where no time-scales are provided. The ScalerGAN algorithm is composed of a generator that gets as input speech with the desired rate and outputs a time-adjusted speech; a discriminator that works on various spectrum scales; and a decoder that converts the time-adjusted signal back to the original rate to maintain consistency. Using an A/B test and conditional A/B test, human listeners were asked to compare ScalerGAN with other state-of-the-art TSM methods. The results showed that the speech quality of ScalerGAN outperforms all other methods.

Tags:

Image, Video, and Multidimensional Signal Processing

Speech Time-Scale Modification With GANs

Eyal Cohen (Technion); Joseph Keshet (Technion - Israel Institute of Technology); Felix Kreuk (Bar-Ilan University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Self-Supervised Learning Based Anomaly Detection in Synthetic Aperture Radar Imaging

Recallable Question Answering-based Re-ranking Considering Semantic Region for Cross-modal Retrieval

Double Nonstationarity: Blind Extraction of Independent Nonstationary Vector/Component from Nonstationary Mixtures â€“ Algorithms

Join the IEEE Signal Processing Society