-
SPS
IEEE Members: $11.00
Non-members: $15.00Pages/Slides: 41
As the most widely-used spatial filtering approach for multi-channel signal separation, beamforming extracts the target signal arriving from a specific direction. We present an emerging approach based on multi-channel complex spectral mapping, which trains a deep neural network (DNN) to directly estimate the real and imaginary spectrograms of the target signal from those of the multi-channel noisy mixture. In this all-neural approach, the trained DNN itself becomes a nonlinear, time-varying spectrospatial filter.
How does this conceptually simple approach perform relative to commonly-used beamforming techniques on different array configurations and in different acoustic environments? We examine this issue systematically on speech dereverberation, speech enhancement, and speaker separation tasks. Comprehensive evaluations show that multi-channel complex spectral mapping achieves speech separation performance comparable to or better than beamforming for different array geometries, and reduces to monaural complex spectral mapping in single-channel conditions, demonstrating the versatility of this new approach for multi-channel and single-channel speech separation. In addition, such an approach is computationally more efficient than popular mask-based beamforming. We conclude that this neural spectrospatial filter provides a strong alternative to traditional and mask-based beamforming.
How does this conceptually simple approach perform relative to commonly-used beamforming techniques on different array configurations and in different acoustic environments? We examine this issue systematically on speech dereverberation, speech enhancement, and speaker separation tasks. Comprehensive evaluations show that multi-channel complex spectral mapping achieves speech separation performance comparable to or better than beamforming for different array geometries, and reduces to monaural complex spectral mapping in single-channel conditions, demonstrating the versatility of this new approach for multi-channel and single-channel speech separation. In addition, such an approach is computationally more efficient than popular mask-based beamforming. We conclude that this neural spectrospatial filter provides a strong alternative to traditional and mask-based beamforming.