MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

Mohammad Reza Hasanabadi (Shahid Beheshti University); Majid - Behdad (Shahid Beheshti University); Davood Gharavian (Shahid Beheshti University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC-inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.

Tags:

Speech production, perception and psychoacoustics

MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

Mohammad Reza Hasanabadi (Shahid Beheshti University); Majid - Behdad (Shahid Beheshti University); Davood Gharavian (Shahid Beheshti University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

NBA-OMP: NEAR-FIELD BEAM-SPLIT-AWARE ORTHOGONAL MATCHING PURSUIT FOR WIDEBAND THZ CHANNEL ESTIMATION

Meta-AF: Meta-Learning for Adaptive Filters

Online Phase Reconstruction via DNN-Based Phase Differences Estimation

Join the IEEE Signal Processing Society