4:40 PM - 5:00 PM
[3N5-OS-11b-04] Fairwashing: the risk of rationalization
Keywords:Fairness in machine learning, Interpretability in machine learning
Black-box explanation is the problem of explaining how a machine learning model produces its outcomes. While current model explanation techniques provides interpretability, they can be used in a negative manner to perform fairwashing, which we define as promoting the perception that a machine learning model respects fairness while it might not be the case. We demonstrate systematic rationalizations taken by an unfair black-box model using the model explanation with a given fairness metric. Our solution, LaundryML, is based on a regularized rule list enumeration algorithm whose objective is to search for fair rule lists approximating an unfair black-box model. We empirically evaluate our method on black-box models trained on real-world datasets and show that one can obtain rule lists with high fidelity while being considerably less unfair at the same time.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.