Interpretation of Neural Networks is Susceptible to Universal Adversarial Perturbations

Haniyeh Ehsani Oskouie (Sharif University of Technology); Farzan Farnia (The Chinese University of Hong Kong)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Interpreting neural network classifiers using gradient-based saliency maps has been extensively studied in the deep learning literature. While the existing algorithms manage to achieve satisfactory performance in application to standard image recognition datasets, recent works demonstrate the vulnerability of widely-used gradient-based interpretation schemes to norm-bounded perturbations adversarially designed for every individual input sample. However, such adversarial perturbations are commonly designed using the knowledge of an input sample, and hence perform sub-optimally in application to an unknown or constantly changing data point. In this paper, we show the existence of a Universal Perturbation for Interpretation (UPI) for standard image datasets, which can alter a gradient-based feature map of neural networks over a significant fraction of test samples. To design such a UPI, we propose a gradient-based optimization method as well as a principal component analysis (PCA)-based approach to compute a UPI which can effectively alter a neural network's gradient-based interpretation on different samples. We support the proposed UPI approaches by presenting several numerical results of their successful applications to standard image datasets.

Tags:

adversarial machine learning

Interpretation of Neural Networks is Susceptible to Universal Adversarial Perturbations

Haniyeh Ehsani Oskouie (Sharif University of Technology); Farzan Farnia (The Chinese University of Hong Kong)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

GNP ATTACK: TRANSFERABLE ADVERSARIAL EXAMPLES VIA GRADIENT NORM PENALTY

Measuring the Transferability of L-infty Attacks by the L-2 Norm

SC-NET: SALIENT POINT AND CURVATURE BASED ADVERSARIAL POINT CLOUD GENERATION NETWORK

Join the IEEE Signal Processing Society