Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

Jun Xue (Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University); Cunhang Fan (Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University); Jiangyan Yi (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences); chenglong wang (CASIA); Zhengqi Wen (Qiyuan Laboratory); Dan Zhang (Department of Psychology, Tsinghua University); zhao lv (anhui university)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can not capture this very well. To address this problem, we propose using the deepest network instruct shallow network for enhancing shallow networks. Specifically, the networks of FSD are divided into several segments, the deepest network being used as the teacher model, and all shallow networks become multiple student models by adding classifiers. Meanwhile, the distillation path between the deepest network feature and shallow network features is used to reduce the feature difference. A series of experimental results on the ASVspoof 2019 LA and PA datasets show the effectiveness of the proposed method, with significant improvements compared to the baseline.

Tags:

Speaker verification and anti-spoofing

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Improving Transformer-Based Networks with Locality for Automatic Speaker Verification

Predictive SkiM: Contrastive Predictive Coding for Low-Latency Online Speech Separation

Leveraging Positional-Related Local-Global Dependency for Synthetic Speech Detection

Join the IEEE Signal Processing Society