Computational Efficient Monaural Speech Enhancement with Universal Sample rate Band-split RNN

Jianwei Yu (Tencent AI lab); Yi Luo (Tencent AI Lab)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

While recent developments on the design of neural networks have greatly advanced the state-of-the-art of speech enhancement and separation systems, practical applications of such networks often put extra constraints on their model size and computational complexity. Moreover, as different telecommunication services may have different transmission bandwidths which result in different signal sample rates, one model is typically designed for a particular sample rate. In this paper, we extend the usage of a recently proposed frequency-domain source separation model, the band-split RNN (BSRNN), to the task of universal-sample-rate resource efficient speech enhancement. BSRNN explicitly splits the spectrogram into different frequency bands and perform interleaved band-level and sequence-level modeling, and the bandwidths can be manually designed to balance the model size, computational cost, and performance. By properly designing the band-splitting scheme and the hyperparameters, a single BSRNN model can handle signals at a wide range of sample rates, and the computational cost required to process a lower-sample-rate signal can be smaller than that of a higher-sample-rate signal. Experiment results show that compared to various benchmark systems in speech enhancement and separation, our universal-sample-rate BSRNN (USR-BSRNN) achieves comparable or better signal-to-noise ratio (SNR) performance at a same level of model size or computational cost.

Tags:

Optimization methods for signal processing

Computational Efficient Monaural Speech Enhancement with Universal Sample rate Band-split RNN

Jianwei Yu (Tencent AI lab); Yi Luo (Tencent AI Lab)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Element Selection with Wide Class of Optimization Criteria Using Non-convex Sparse Optimization

Elliptical Wishart distribution: maximum likelihood estimator from information geometry

On the primal and dual formulations of the Discrete Mumford-Shah functional

Join the IEEE Signal Processing Society