Skip to main content


Ju Lin, Kaustubh Kalgaonkar, Qing He, Xin Lei

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:10:40
11 May 2022

Speech codec compresses the input signal into compact bit stream, which is then decoded at the receiver to generate the best possible perceptual quality. This compression makes storing and transmitting speech efficient. In this work, we propose a neural extension to low bit rate speech codec (e.g., Codec2) that aims to improve the perceptual quality of synthesized speech. Our proposed framework combines decoded audio with neural embeddings without breaking the existing speech coders. In addition to embeddings, we also use the least-square generative adversarial network (LSGAN) to reduce artifacts and prevent over-smoothing in the reconstructed audio. The Mean Opinion Scores (MOS) from the listening tests show that our framework can boost the audio quality of speech encoded at 3.6kbps to outperform that of speech encoded at 6kbps using Opus.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00