Improving Adversarial Transferability via Feature Translation
Yoonji Kim, Seungju Cho, Junyoung Byun, Myung-Joon Kwon, Changick Kim
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Deep Neural Networks (DNNs) are vulnerable to adversarial examples, which are crafted to cause the model to make wrong predictions. In real-world scenario, since adversary cannot access to target models, black-box attack has attracted great attention. Among them, many studies have been conducted on transfer-based attacks because they can effectively attack unknown target model. However, transfer-based attacks often fail to fool other models which have slightly different activation maps because adversarial examples tend to overfit to the source model. To alleviate this problem, we introduce Feature Translation Attack (FTA), which applies translation on intermediate features during optimization process. Specifically, FTA generates a new adversarial example whose feature is similar to the ensemble of translated features from the existing adversarial example. We achieved better performance than state-of-the-art methods in extensive experiments.