MLP-SVNET : A MULTI-LAYER PERCEPTRONS BASED NETWORK FOR SPEAKER VERIFICATION

Bing Han, Zhengyang Chen, Bei Liu, Yanmin Qian

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:07:09

11 May 2022

Convolution and self-attention based neural networks have both obtained excellent performance in automatic speaker verification. However, the convolution model often lacks the ability of long-term dependency modeling due to the limitation of receptive field, while the self-attention model is insufficient to model local information. To tackle this limitation, we propose a new multi-layer perceptrons based speaker verification network (MLP-SVNet) which can apply MLPs across temporal and frequency dimensions to capture the local and global information at the same time. The experimental results conducted on Voxceleb show that the proposed model is very competitive when compared to other systems based on convolution or self-attention. In addition, we demonstrate that MLP-SVNet based on multi-layer perceptrons can produce complementary embeddings, which can be fused with the state-of-the-art system to further improve the performance.

Tags:

speaker embedding

speaker verification

multi-layer perceptron

text-independent

MLP-SVNET : A MULTI-LAYER PERCEPTRONS BASED NETWORK FOR SPEAKER VERIFICATION

Bing Han, Zhengyang Chen, Bei Liu, Yanmin Qian

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

NEURAL FIELD REAL-TIME TRANSMISSION USING MULTIPLE DESCRIPTION CODING WITH RANDOM POSITION SAMPLING

Few-Shot Lip-Password Based Speaker Verification

Towards End-to-End Speaker Diarization with Generalized Neural Speaker Clustering

Join the IEEE Signal Processing Society