Skip to main content

Predictive SkiM: Contrastive Predictive Coding for Low-Latency Online Speech Separation

Chenda Li (Shanghai Jiao Tong University); Yifei Wu (Shanghai Jiao Tong University); Yanmin Qian (Shanghai Jiao Tong University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

In online speech separation, there is a trade-off between inherent latency and speech separation performance. When processing the current input audio, looking ahead to more future context usually brings better speech separation performance but increases the algorithm latency, and vice versa. In the requirements of extremely low latency, the future context is expensive for the algorithm latency and may not be available. In this work, we apply the contrastive predictive coding (CPC) method to the previously proposed online Skipping Memory (SkiM) speech separation model, which is a low-latency model for online speech separation. During the training stage, the SkiM model is required to predict the future memory states given the history memory. By using CPC training, the predictive SkiM model shows stronger causal sequence modeling capacity in the online speech separation task. In addition, we explore a local context codec (LCC) method to reduce the computational cost, and we make qualitative analyses on it. Our best online predictive SkiM equipped with CPC and LCC gets 15.5 dB SI-SNR improvement on WSJ0-2mix benchmark with 3-ms actual latency tested on a single-core CPU, which should be the state-of-the-art results among causal models.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00