TOWARDS ROBUST SPEECH-TO-TEXT ADVERSARIAL ATTACK
Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:16
This paper introduces a novel adversarial algorithm for attacking the advanced speech-to-text transcription systems. Our proposed approach is based on developing an extension for the conventional distortion condition of the general adversarial optimization formulation using the Cram\'{e}r integral probability metric. Minimizing over such a metric contributes to crafting signals very close to the subspace of legitimate speech recordings. That helps yield more robust adversarial signals against over-the-air playbacks without employing neither costly expectation over transformations nor static room impulse response simulations. Our approach considerably outperforms other targeted and non-targeted algorithms in terms of word error rate and sentence-level accuracy. Furthermore compared to seven other strong white and black-box adversarial attacks, our proposed approach is considerably more resilient against multiple consecutive over-the-air playbacks, corroborating its higher robustness in noisy environments.