6:10 PM - 6:30 PM
[2L6-OS-19b-03] Confident Learning using Confidence Entropy of Classifier Output
Keywords:Semi-Supervised Learning, Measurement Informatics, Machine Learning
For datasets with noisy labels, there are two approaches: a model-centric approach by devising loss functions, etc., and Confident Learning (hereafter CL), a data-centric approach that determines label errors.
In CL, the problem is to find examples of mislabeled cases belonging to one class that are mislabeled to another class, and it is assumed that mislabeling depends only on the class to which the data belongs, not on the individual data.
However, in real-world problems, there are situations in which data that does not belong to any class is mislabeled as a class, and in such cases, the problem setting and assumptions in the CL do not hold.
In this study, we propose a method to search for the threshold by setting a threshold on the entropy of the confidence level of each label output by the classifier, training a new classifier for cases that exceed the threshold as noise label cases, and evaluating the confusion matrix of the test data through the trained classifier. As an example of application to real data, we discuss the problem of virus detection using nanopore devices.
In CL, the problem is to find examples of mislabeled cases belonging to one class that are mislabeled to another class, and it is assumed that mislabeling depends only on the class to which the data belongs, not on the individual data.
However, in real-world problems, there are situations in which data that does not belong to any class is mislabeled as a class, and in such cases, the problem setting and assumptions in the CL do not hold.
In this study, we propose a method to search for the threshold by setting a threshold on the entropy of the confidence level of each label output by the classifier, training a new classifier for cases that exceed the threshold as noise label cases, and evaluating the confusion matrix of the test data through the trained classifier. As an example of application to real data, we discuss the problem of virus detection using nanopore devices.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.