LEARNING DOMAIN-INVARIANT TRANSFORMATION FOR SPEAKER VERIFICATION
Hanyi Zhang, Longbiao Wang, Meng Liu, Jianwu Dang, Hui Chen, Kong Aik Lee
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:58
Automatic speaker verification (ASV) faces domain shift caused by the mismatch of intrinsic and extrinsic factors such as recording device and speaking style in real-world applications, which leads to unsatisfactory performance. To this end, we propose the meta generalized transformation via meta-learning to build a domain-invariant embedding space. Specifically, the transformation module is motivated to learn the domain generalization knowledge by executing meta-optimization on the meta-train and meta-test sets which are designed to simulate domain shift. Furthermore, distribution optimization is incorporated to supervise the metric structure of embeddings. In terms of the transformation module, we investigate various instantiations and observe the multilayer perceptron with gating (gMLP) is the most effective given its extrapolation capability. The experimental results on cross-genre and cross-dataset settings demonstrate that the meta generalized transformation dramatically improves the robustness of ASV systems to domain shift, while outperforms the state-of-the-art methods.