Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding

Haici Yang (Indiana University); Wootaek Lim (ETRI); Minje Kim (Indiana University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Low and ultra-low-bitrate neural speech codecs achieved unprecedented coding gain by generating speech signals from compact features. This paper introduces additional coding efficiency in speech coding by reducing the temporal redundancy existing in the frame-level feature sequence via a feature predictor. This predictor produces low-entropy residual representations, and we discriminatively code them based on their contribution to the signal reconstruction. Combining feature prediction and discriminative coding optimizes bitrate efficiency by assigning more bits to hard-to-predict events. We demonstrate the advantage of the proposed methods using the LPCNet as a neural vocoder, resulting in a scalable, lightweight, low-latency, and low-bitrate neural speech coding system. While our approach guarantees strict causality in the frame-level prediction, the subjective tests and feature space analysis show that our model achieves superior coding efficiency compared to the loosely-causal LPCNet and Lyra V2 in the very low bitrates.

Tags:

Pattern recognition and classification

Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding

Haici Yang (Indiana University); Wootaek Lim (ETRI); Minje Kim (Indiana University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

HalluAudio: Hallucinate frequency as concepts for few-shot audio classification

FedSD: A New Federated Learning Structure Used in Non-iid Data

Multi-view K-means with Laplacian Embedding

Join the IEEE Signal Processing Society