Reliability Estimation for Synthetic Speech Detection
Davide Salvi (Politecnico di Milano); Paolo Bestagini (Politecnico di Milano); Stefano Tubaro (Politecnico di Milano, Italy)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Recent advances in speech synthesis and counterfeit audio generation have pushed the multimedia forensics community to develop speech deepfake detection techniques to avoid threats and unpleasant situations.
Although synthetic speech detectors show excellent performance in controlled conditions, they are not always reliable in open set cases, when evaluated on data that are very different from those seen during training.
This can lead to misleading scores and poorly indicative results in real-world scenarios.
In this paper, we propose a method for estimating the reliability of a prediction performed by a speech deepfake detector.
This enables us to perform the detection only on the most relevant portions of a signal, i.e., the time windows on which we obtain more reliable scores. This increases the final accuracy of the developed systems.
As some audio fragments may not contain enough traces for the task at hand and negatively affect the system output, a reliability estimator allows us to discard them and focus only on the most pertinent data.
The proposed method proves to positively impact the performance of the considered detector and shows excellent generalization capabilities on unseen datasets.