AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

Yi-Chiao Wu (META); Israel Dejene Gebru (Reality Labs Research); Dejan Markovic (META); Alexander Richard (META)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e.\ the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e.\ encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the signal. In this work, we propose an open-source, streamable, and real-time neural audio codec that achieves strong performance along all three axes: it can reconstruct highly natural sounding 48~kHz speech signals while operating at only 12~kpbs and running with less than 6~ms (GPU)/10~ms (CPU) latency. An efficient training paradigm is also demonstrated for developing such neural audio codecs for real-world scenarios. Both objective and subjective evaluations using the VCTK corpus are provided. To sum up, AudioDec is a well-developed plug-and-play benchmark for audio codec applications.

Tags:

Audio for multimedia and audio processing systems

AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

Yi-Chiao Wu (META); Israel Dejene Gebru (Reality Labs Research); Dejan Markovic (META); Alexander Richard (META)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

MUSIC REARRANGEMENT USING HIERARCHICAL SEGMENTATION

Building Keyword Search System from End-to-End ASR Systems

Toward Universal Text-to-Music Retrieval

Join the IEEE Signal Processing Society