Synthetic speech detection based on local autoregression and variance statistics

Sanshuai Cui (Sun Yat-sen University); Bingyuan Huang (Sun Yat-Sen University); Jiwu Huang (Shenzhen University); Xiangui Kang (Sun Yat-Sen University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

With the development of speech synthesis technology, the existing synthetic speech detection (SSD) methods cannot generalize well for unknown synthesis algorithms. And thus, this kind of speech forensics task meets a great challenge and has attracted great enthusiasm. We observe that the process of speech synthesis always includes the resampling and pooling/smoothing operations, which will change the speech’s local autoregressive (AR) and statistic distribution. In this paper, based on AR modeling and standard deviation statistics, we propose novel front-end speech features, i.e., ARS in short form, as the input of an SSD classifier. In addition, a new back-end classifier is constructed based on the dense convolution and short connection, and we name it scDenseNet. Experimental results on the ASVspoof2019 logical access (LA) dataset demonstrate that the ARS has a strong representation and sensitivity to spoofing attacks, and achieves promising performance on SSD. The proposed scDenseNet outperforms the previous version DenseNet on both EER and t-DCF scores, and achieves the best performance when compared with other state-of-the-art classifiers studied in this paper. Furthermore, based on the proposed scDenseNet, incorporating ARS with popular features such as the linear frequency cepstral coefficients (LFCC) significantly enhances the fusion performance and yields an EER score of 0.98%.

Tags:

Image, Video, and Multidimensional Signal Processing

Synthetic speech detection based on local autoregression and variance statistics

Sanshuai Cui (Sun Yat-sen University); Bingyuan Huang (Sun Yat-Sen University); Jiwu Huang (Shenzhen University); Xiangui Kang (Sun Yat-Sen University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Recallable Question Answering-based Re-ranking Considering Semantic Region for Cross-modal Retrieval

Self-Supervised Learning Based Anomaly Detection in Synthetic Aperture Radar Imaging

Selective Listening by Synchronizing Speech With Lips

Join the IEEE Signal Processing Society