Self-Examination Mechanism: Lightweight Defense Mechanism against Adversarial Examples using Explainable AI

Sora Suegami

9:40 AM - 10:00 AM

[2A1-GS-2-03] Self-Examination Mechanism: Lightweight Defense Mechanism against Adversarial Examples using Explainable AI

〇Sora Suegami¹, Yutaro Oguri¹, Zaiying Zhao¹, Yu Kagaya¹, Koki Mukai¹, Shun Yoshida¹, Fu Chen¹, Toshihiko Yamasaki¹ (1. The University of Tokyo)

Keywords:Adversarial example, Explainable AI, Image classification

Deep learning-based image classification models are vulnerable to adversarial examples (AEs). Existing defense methods have improved the classification accuracy for AEs, but the classification accuracy for clean images without perturbations decreases. To solve this problem, we propose a new defense mechanism called self-examination mechanism. In the proposed method, the input image is first classified. Then, the inference process of the classification model is verified using SHapley Additive exPlanations (SHAP), a method of explainable AI. If the input image is abnormal, the classification is performed again based on the output of SHAP. Thus, misclassification of AEs can be prevented without significantly reducing the classification accuracy of clean images. Evaluations on ResNet and WideResNet trained with CIFAR10 demonstrate that our method improves the accuracy for AEs and hardly reduces the accuracy for clean images.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2A1-GS-2] Machine learning

[2A1-GS-2-03] Self-Examination Mechanism: Lightweight Defense Mechanism against Adversarial Examples using Explainable AI

Password