Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms
Tomohiro Nakatani (NTT Communication Science Laboratories); Rintaro Ikeshita (NTT); Keisuke Kinoshita (Google); Hiroshi Sawada (NTT); Naoyuki Kamo (NTT); Shoko Araki (NTT Corporation)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper develops a framework that can accurately perform denoising, dereverberation, and source separation using a relatively small number of microphones. It has been empirically confirmed that Independent Vector Analysis (IVA) can blindly separate N sources from their sound mixture even with diffuse noise when a sufficiently large number (=M) of microphones are available (i.e., M>>N). However, the estimation accuracy is seriously degraded when the number of microphones, or more specifically M−N (>=0), decreases. To overcome this IVA limitation, we propose switching IVA (swIVA) in this paper. With swIVA, the time frames of an observed signal with time-varying characteristics are clustered into several groups, each of which can be well handled by IVA with a small number of microphones, and thus accurate estimation can be achieved by individually applying IVA to each group. Conventionally, a switching mechanism was introduced into a Minimum-Variance Distortionless Response (MVDR) beamformer, and this paper extends the mechanism to work with a blind source separation algorithm. To incorporate dereverberation capability, we further extend swIVA to a blind Convolutional beamforming algorithm (swCIVA) that integrates swIVA and switching Weighted Prediction Error-based dereverberation (swWPE) in a jointly optimal way. With swCIVA, two different time-varying characteristics of an observed signal are captured for dereverberation and source separation to achieve effective estimation. We show that both swIVA and swCIVA can be optimized effectively based on blind signal processing, and their performance can be further improved using a spatial guide for initialization. Experiments demonstrate that both the proposed methods largely outperformed conventional IVA and its convolutional beamforming extension (CIVA) in terms of objective signal quality and automatic speech recognition scores when using relatively few microphones.