Skip to main content

Progressive Multi-stage Neural Audio Codec with Psychoacoustic Loss and Discriminator

Byeong Hyeon Kim (Yonsei University); Hyungseob Lim (Yonsei University); Jihyun Lee (yonsei university); Inseon Jang (Electronics and Telecommunications Research Institution); Hong-Goo Kang (Yonsei University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

In this paper, we improve the efficiency of the progressive multi-stage neural audio codec (PR-Codec) by utilizing perceptually motivated training criteria. Although our baseline PR-Codec successfully reconstructs full-band signals by progressively decoding the pre-defined subband signals, transparent quality can only be guaranteed in high bit-rates. To reduce bit-rates while maintaining perceptually transparent quality, we adopt a psychoacoustic model (PAM)-based loss and propose a perceptual weighting discriminator (PWD), which enables us to synthesize and discriminate audio signals in the perceptually motivated domain. We also introduce a scalar quantization with an entropy model to further enhance the quantization efficiency. Our experimental results show that our proposed model significantly improves perceptual reconstruction quality at the expense of the waveform disparity in the time-domain, compared to our previous model.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00