Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding
Haici Yang (Indiana University); Wootaek Lim (ETRI); Minje Kim (Indiana University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Low and ultra-low-bitrate neural speech codecs achieved unprecedented coding gain by generating speech signals from compact features. This paper introduces additional coding efficiency in speech coding by reducing the temporal redundancy existing in the frame-level feature sequence via a feature predictor. This predictor produces low-entropy residual representations, and we discriminatively code them based on their contribution to the signal reconstruction. Combining feature prediction and discriminative coding optimizes bitrate efficiency by assigning more bits to hard-to-predict events. We demonstrate the advantage of the proposed methods using the LPCNet as a neural vocoder, resulting in a scalable, lightweight, low-latency, and low-bitrate neural speech coding system. While our approach guarantees strict causality in the frame-level prediction,
the subjective tests and feature space analysis show that our model achieves superior coding efficiency compared to the loosely-causal LPCNet and Lyra V2 in the very low bitrates.