END-TO-END NEURAL AUDIO CODING IN THE MDCT DOMAIN
Hyungseob Lim (Yonsei University); Jihyun Lee (yonsei university); Byeong Hyeon Kim (Yonsei University); Inseon Jang (Electronics and Telecommunications Research Institution); Hong-Goo Kang (Yonsei University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Modern deep neural network (DNN)-based audio coding approaches utilize complicated non-linear functions (e.g., convolutional neural networks and non-linear activations), which leads to high complexity and memory usage. However, their decoded audio quality is still not much higher than that of signal processing-based legacy codecs. In this paper, we propose an effective frequency-domain neural audio coding paradigm that adopts the modified discrete cosine transform (MDCT) for analysis and synthesis and DNNs for the quantization of variables. It includes an efficient method to encode MDCT bins as well as a mechanism to adapt the quantization level of each bin. Our neural audio codec is trained in an end-to-end manner with the help of psychoacoustics-based perceptual loss, removing the burden of module-by-module fine-tuning. Experimental results show that our proposed model's performance is comparable with the MP3 codec at around 64 and 48 kbps bit-rates for mono signals.