SPEECH ENHANCEMENT FOR LOW BIT RATE SPEECH CODEC

Ju Lin, Kaustubh Kalgaonkar, Qing He, Xin Lei

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:40

11 May 2022

Speech codec compresses the input signal into compact bit stream, which is then decoded at the receiver to generate the best possible perceptual quality. This compression makes storing and transmitting speech efficient. In this work, we propose a neural extension to low bit rate speech codec (e.g., Codec2) that aims to improve the perceptual quality of synthesized speech. Our proposed framework combines decoded audio with neural embeddings without breaking the existing speech coders. In addition to embeddings, we also use the least-square generative adversarial network (LSGAN) to reduce artifacts and prevent over-smoothing in the reconstructed audio. The Mean Opinion Scores (MOS) from the listening tests show that our framework can boost the audio quality of speech encoded at 3.6kbps to outperform that of speech encoded at 6kbps using Opus.

Tags:

generative adversarial network

speech codec

vq-vae

SPEECH ENHANCEMENT FOR LOW BIT RATE SPEECH CODEC

Ju Lin, Kaustubh Kalgaonkar, Qing He, Xin Lei

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

OMISSION-FREE INPAINTING: A THREE-STAGE APPROACH TO ENSURE OBJECT GENERATION

MDFD: STUDY OF DISTRIBUTED NON-IID SCENARIOS AND FRECHET DISTANCE-BASED EVALUATION

LOW-SAMPLING-FREQUENCY PLANE WAVE MEDICAL ULTRASOUND IMAGING BASED ON ADVERSARIAL LEARNING

Join the IEEE Signal Processing Society