Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:49
09 May 2022

Recently, the segmented sample-level modeling approach based on Dual-Path Recurrent Neural Network (DPRNN) has been proved to be effective in Monaural Speech Separation (MSS). Many dual-path networks such as Dual-Path Transformer Network (DPTNet), with a series of improvements to DPRNN, have also improved the separation performance since these methods are effective to process long sequences. However, the receptive fields of these methods are fixed during local and global features learning, which makes it difficult to capture different scale local and global information in long sequences. In this paper, we propose a novel Multiscale Time-Delay Sampling method (MTDS) for the dual-path networks in MSS to learn sequence features from fine to coarse by multiscale time-delay sampling, which effectively integrates different scale local and global information for long sequences. Our experiments on the notable benchmark WSJ0-2mix data corpus result in 21.7dB SDRi and 21.5dB SI-SNRi, which obviously outperforms the state-of-the-arts without data augmentation.