A Study on Black-Box Adversarial Attack for Image Interpreters

Yudai Hirose

9:40 AM - 10:00 AM

[2L1-GS-11-03] A Study on Black-Box Adversarial Attack for Image Interpreters

〇Yudai Hirose¹, Ayane Tajima¹, Satoshi Ono¹ (1. Kagoshima University)

Keywords:eXplainable AI, Adversarial Examples, Image Recognition, Deep Learning

Deep neural network (DNN) are widely used in various fields and are increasingly being applied to real-world problems including human decision-making tasks. However, in such situations, issues such as fairness of output results, ethical validity, and opaqueness of the model have arisen. To mitigate these problems, eXplainable AI (XAI), which explains the reasoning basis of DNN, is actively studied. On the other hand, it has been revealed that DNN based models have vulnerabilities called Adversarial Examples (AEs), which cause erroneous decisions by adding special perturbations to the input data that are imperceptible to humans. Such vulnerabilities have been confirmed to exist in image interpreters such as grad-cam, and it is essential to investigate these vulnerabilities in order to use such image interpreters safely. In this study, we propose an adversarial attack method to generate AEs that produce incorrect interpretations by using evolutionary computation under black box conditions where the internal structure of the attacked model is unavailable. Experimental results showed that the proposed method successfully generated AEs that mislead the interpretation results without changing the classification results of the image recognition model.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2L1-GS-11] AI and Society

[2L1-GS-11-03] A Study on Black-Box Adversarial Attack for Image Interpreters

Password