Signal Processing for Blind Source Separation of Speech and Music
Dr. Hiroshi Sawada
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 01:31:46
Humans can naturally separate mixed speeches and also musical instruments. Building such a capability into a computer contributes to automatic speech recognition in noisy environments, hearing aids, music analysis, etc. This talk starts by defining the task of blind source separation (BSS) as separating sound sources from the mixtures with as little prior information as possible. We will have a live demonstration of BSS to separate two simultaneous speeches recorded with a stereo IC recorder. Various signal processing techniques for BSS will then be explained. Independent component analysis (ICA) is a primary method for BSS and makes the outputs independent and far from a stationary Gaussian (normal) distribution. However, to separate sounds mixed in a real reverberant environment, we additionally need to solve the source modeling task. Nonnegative matrix factorization (NMF) is a way to model a sound source by identifying frequent sound patterns with a low-rank approximation. Independent low-rank matrix analysis (ILRMA) is a sophisticated integration of ICA and NMF, and achieves BSS in a real reverberant environment. For more challenging tasks of underdetermined situations, where the sources outnumber the microphones, full-rank spatial covariance analysis (FCA) is effective. All these signal processing techniques can be connected by modeling the sound sources with time-varying Gaussian distributions.