Skip to main content

Synthetic speech detection based on local autoregression and variance statistics

Sanshuai Cui (Sun Yat-sen University); Bingyuan Huang (Sun Yat-Sen University); Jiwu Huang (Shenzhen University); Xiangui Kang (Sun Yat-Sen University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
09 Jun 2023

With the development of speech synthesis technology, the existing synthetic speech detection (SSD) methods cannot generalize well for unknown synthesis algorithms. And thus, this kind of speech forensics task meets a great challenge and has attracted great enthusiasm. We observe that the process of speech synthesis always includes the resampling and pooling/smoothing operations, which will change the speech’s local autoregressive (AR) and statistic distribution. In this paper, based on AR modeling and standard deviation statistics, we propose novel front-end speech features, i.e., ARS in short form, as the input of an SSD classifier. In addition, a new back-end classifier is constructed based on the dense convolution and short connection, and we name it scDenseNet. Experimental results on the ASVspoof2019 logical access (LA) dataset demonstrate that the ARS has a strong representation and sensitivity to spoofing attacks, and achieves promising performance on SSD. The proposed scDenseNet outperforms the previous version DenseNet on both EER and t-DCF scores, and achieves the best performance when compared with other state-of-the-art classifiers studied in this paper. Furthermore, based on the proposed scDenseNet, incorporating ARS with popular features such as the linear frequency cepstral coefficients (LFCC) significantly enhances the fusion performance and yields an EER score of 0.98%.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00