Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:32
08 May 2022

Recently, Integrated Gradients-based (IG) methods have been commonly used to explain the decision process of deep neural networks (DNNs). However, they have only considered the information of the predicted class while neglecting the information of the rest classes. In this paper, we propose a novel counterfactual explanation method, Discriminative Gradients (DiscGrad) that derives explainable discriminative attributes by considering not only the predicted class but also the counterfactual classes. Specifically, we calculate the discriminative attributes by removing the attribute of the counterfactual classes, and this process makes it possible to derive only key discriminative attributes that contrast with other decisions. Also, we determine the weights for discriminative attributes using the degree of confusion about counterfactual classes. We evaluated our method by measuring how much logit decreases by perturbing important attributes. Experimental results on the widely used image and text datasets show that our proposed method outperforms the strong baseline, IG. In addition, we examine the relationship between class correlation and the performance of discriminative attribute to demonstrate the effectiveness of our method.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00