Countermeasure and Identifying Poison Data against Backdoor Attack on Neural Networks Utilizing Knowledge Distillation

Kota Yoshida

3:00 PM - 3:20 PM

[4J3-GS-2-04] Countermeasure and Identifying Poison Data against Backdoor Attack on Neural Networks Utilizing Knowledge Distillation

〇Kota Yoshida¹, Takeshi Fujino¹ (1. Ritsumeikan University)

Keywords:Poisoning attack, Backdoor attack, Neural Network

A backdoor attack is one of the model poisoning attacks against the machine learning system such as deep neural networks (DNNs). In the backdoor attack against the image classification system, an adversary creates some tampered data that has adversarial marks and injects them into a training dataset. A DNN model that is trained by the tampered training dataset can achieve high classification accuracy for clean input data but the inference on the input data with adversarial marks is misclassified to the adversarial target label. In this paper, we propose the countermeasure against the backdoor attack utilizing knowledge distillation. A DNN model user distills clean knowledge from the backdoored model utilizing clean unlabeled data. The distilled model achieves high classification accuracy without being affected by the backdoor. Furthermore, the user distinguishes the tampered data injected into the training dataset by comparing the classification results of the backdoored model and the distilled model.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4J3-GS-2] Machine learning: Adversarial examples and security

[4J3-GS-2-04] Countermeasure and Identifying Poison Data against Backdoor Attack on Neural Networks Utilizing Knowledge Distillation

Password