MULTI-CHANNEL SPEAKER VERIFICATION WITH CONV-TASNET BASED BEAMFORMER

Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:11

12 May 2022

We focus on the problem of speaker recognition in far-field multi-channel data. The main contribution is introducing an alternative way of predicting spatial covariance matrices (SCMs) for a beamformer from the time domain signal. We propose to use Conv-TasNet, a well-known source separation model, and we adapt it to perform speech enhancement by forcing it to separate speech and additive noise. We experiment with using the STFT of Conv-TasNet outputs to obtain SCMs of speech and noise, and finally, we fine-tune this multi-channel frontend w.r.t. speaker verification objective. We successfully tackle the problem of the lack of a realistic multi-channel training set by using simulated data of MultiSV corpus. The analysis is performed on its retransmitted and simulated test parts. We achieve consistent improvements with a 2.7 times smaller model than the baseline based on a scheme with mask estimating NN.

Tags:

conv-tasnet

embedding extractor

speaker verification

multisv

beamforming

MULTI-CHANNEL SPEAKER VERIFICATION WITH CONV-TASNET BASED BEAMFORMER

Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Few-Shot Lip-Password Based Speaker Verification

Multi-User Data Detection in Massive MIMO with 1-Bit ADCs

DEEP NEURAL MEL-SUBBAND BEAMFORMER FOR IN-CAR SPEECH SEPARATION

Join the IEEE Signal Processing Society