SeliNet: A Lightweight Model for Single Channel Speech Separation

Ha Minh Tan (National Central University); Duc-Quang Vu (Thai Nguyen Univerisity of Education); Jia-Ching Wang (National Central University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

The time-domain speech separation methods adopting deep learning have obtained impressive performance. However, the computational complexity, model size, and performance are still the challenges for the implementation on real-time low-resource devices. In this paper, we introduce a lightweight yet effective network for speech separation, namely SeliNet. The SeliNet is the one-dimensional convolutional architecture that employs bottleneck modules, and atrous temporal pyramid pooling. In bottleneck modules, the depth-wise separable convolution significantly decreases the model size and computational cost meanwhile the squeeze excitation uses a context vector to interact with the entire hidden state vector. Specifically, the atrous temporal pyramid pooling recognizes long-time sequences of various lengths and extracts context at different field-of-views. This helps SeliNet to obtain impressive performance while still maintaining the small computational cost and model size.

Tags:

Emerging topics in signal processing systems