WASSERTRAIN: AN ADVERSARIAL TRAINING FRAMEWORK AGAINST WASSERSTEIN ADVERSARIAL ATTACKS
Qingye Zhao, Xin Chen, Enyi Tang, Xuandong Li, Zhuoyu Zhao
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:08
This paper presents an adversarial training framework WasserTrain for improving model robustness against the adversarial attacks in terms of the Wasserstein distance. First, an effective attack method WasserAttack is introduced with a novel encoding of the optimization problem, which directly finds the worst point within the Wasserstein ball while keeping the relaxation error of the Wasserstein transformation as small as possible. The proposed adversarial training framework utilizes these high-quality adversarial examples to train robust models. Experiments on MNIST show that the adversarial loss arising from adversarial examples found by our method is about three times as much as that found by the PGD-based attack method. Furthermore, within the Wasserstein ball with a radius of 0.5, the WasserTrain model achieves 31% adversarial robustness against WasserAttack, which is 22% higher than that on the PGD-based training model.