Skip to main content
11 Jun 2021

Most automatic speaker verification (ASV) systems are vulnerable to various spoofing attacks. To address this issue, in this article, we propose a novel model based on attention-enhanced DenseNet-BiLSTM network and segment-based linear filter bank features. First, silent segments are selected from each speech signal by using a short-term zero-crossing rate and energy. If the total duration of silent segments only contains a very limited amount of data, the decaying tails will be selected instead. Second, the linear filter bank features are extracted from the selected segments in the relatively high-frequency domain. Finally, an attention-enhanced DenseNet-BiLSTM architecture which can avoid the problems of overfitting is built. To validate this model, we used two datasets, including BTAS2016 and ASVspoof2017. Experiments show that using the attention-enhanced DenseNet-BiLSTM model with the segment-based linear filter bank feature achieves the best performance. Compared with the baseline system based on constant Q cepstral coefficient and Gaussian mixture model (GMM), the proposed model can produce a relative improvement of 91.68% and 74.04% on the two data sets respectively.

Chairs:
Daniele Giacobello

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00