Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:15:09
09 May 2022

We propose a time-domain audio source separation method based on multiresolution analysis (multiresolution deep layered analysis: MRDLA). MRDLA is based on a time-domain deep neural network (DNN) called Wave-U-Net, which performs successive down-sampling (DS) and up-sampling of features. From the signal processing viewpoint, we found that the DS layers of Wave-U-Net cause aliasing and may discard information useful for source separation because they are implemented with decimation. To overcome these two problems, we focus on the architectural resemblance between the successive DS of Wave-U-Net and multiresolution analysis (MA). MA uses discrete wavelet transforms (DWTs), which have anti-aliasing filters and the perfect reconstruction property. We thus develop DWT-based DS layers (DWT layers). We further extend the DWT layers such that their wavelet basis functions can be trained together with the other DNN components while maintaining the perfect reconstruction property. Since a straightforward trainable extension of the DWT layers does not guarantee the existence of anti-aliasing filters, we derive constraints for this guarantee in addition to the perfect reconstruction property. Through music source separation experiments including subjective evaluations, we show the effectiveness of the proposed methods and the importance of simultaneously considering both the anti-aliasing filters and the perfect reconstruction property.

Tags:

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00