Forensics for Adversarial Machine Learning through Attack Mapping Identification
Allen H Yan (Oregon State University); Jinsub Kim ("); Raviv Raich (Oregon State University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper considers the problem of performing post-attack forensic analysis for a test-time attack on a machine learning model. A test-time attack can be represented as a mapping that receives a benign test example as the input and outputs a falsified version of it. Given a set of attacked examples in the post-attack time, our objective is to identify the correct attack mapping among the collection of candidate attack strategies with diverse objectives and constraints. We present an attack mapping identification method that utilizes a pre-attack example recovery mechanism as a feature extraction method. In the experiments using the MNIST dataset, we demonstrate the effectiveness of the proposed approach in detecting the correct attack mapping among 12 different candidate attack strategies.