Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 15:12
04 May 2020

This paper presents an improved deep embedding learning method based on a convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) a multiscale convolution (MSCNN) is adopted in the frame-level layers to capture the complementary speaker information in different receptive fields; (2) a Baum-Welch statistics attention (BWSA) mechanism is applied in the pooling layer, which can integrate more useful long-term speaker characteristics in the temporal pooling layer. Experiments are carried out on the NIST SRE16 evaluation set. The results demonstrate the effectiveness of the MSCNN and show that the proposed BWSA can further improve the performance of the DNN embedding system.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00