Dr. Nhien-An Le-Khac, Dr. Aditya Kuppa
IEEE Members: $11.00
Non-members: $15.00Pages/Slides: 23
Machine learning methods are essential in addressing cybersecurity threats and explaining the decision process of black-box classifiers is critical to their successful adoption. Counterfactual explanations have emerged as a popular approach to understanding why black-box models make confident decisions and highlight alternative data instances that could change the outcomes. However, recent research in Explainable Artificial Intelligence (XAI) has focused on improving explain ability methods, attacks on interpreters, and defining properties of model explanations, overlooking the potential introduction of new attack surfaces through explanations. Adversaries can exploit explanations to launch attacks, compromising system privacy and integrity. Understanding these risks is crucial and developing strategies to mitigate threats is necessary to ensure the security of AI systems. In this webinar, we will explore cybersecurity properties and threat models associated with counterfactual explanations. We will delve into new black-box attacks that exploit Explainable Artificial Intelligence (XAI) methods, compromising the confidentiality and privacy of underlying classifiers. Additionally, we will discuss the significance of these attacks within the context of Large Language Models.