MLP-SVNET : A MULTI-LAYER PERCEPTRONS BASED NETWORK FOR SPEAKER VERIFICATION
Bing Han, Zhengyang Chen, Bei Liu, Yanmin Qian
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:07:09
Convolution and self-attention based neural networks have both obtained excellent performance in automatic speaker verification. However, the convolution model often lacks the ability of long-term dependency modeling due to the limitation of receptive field, while the self-attention model is insufficient to model local information. To tackle this limitation, we propose a new multi-layer perceptrons based speaker verification network (MLP-SVNet) which can apply MLPs across temporal and frequency dimensions to capture the local and global information at the same time. The experimental results conducted on Voxceleb show that the proposed model is very competitive when compared to other systems based on convolution or self-attention. In addition, we demonstrate that MLP-SVNet based on multi-layer perceptrons can produce complementary embeddings, which can be fused with the state-of-the-art system to further improve the performance.