JSAI2024

Presentation information

Organized Session

Organized Session » OS-5

[2T6-OS-5c] OS-5

Wed. May 29, 2024 5:30 PM - 6:30 PM Room T (Room 62)

オーガナイザ:荒井 ひろみ(理研AIP)、小山 聡(名市大)、鹿島 久嗣(京大)、堤 瑛美子(東大)、森 純一郎(東大)

5:30 PM - 5:50 PM

[2T6-OS-5c-01] Unlearning Bias and Toxicity in Large Language Models

〇Huimin Lu1, Masaru Isonuma1,2, Junichiro Mori1,3, Ichiro Sakata1 (1. The University of Tokyo, 2. The University of Edinburgh, 3. RIKEN Center for Advanced Intelligence Project (AIP))

Keywords:Generative AI, Large Language Models, Debias, Unlearning

Large language models (LLMs) often inherit biases from vast amounts of training corpora.
Traditional debiasing methods, while effective to some extent, do not completely eliminate memorized biases and toxicity in LLMs.
In this paper, we introduce a novel approach to debiasing in LLMs based on unlearning techniques by performing gradient ascent on hate speech against minority groups, i.e. minimizing the likelihood of biased or toxic content.
Specifically, we propose a mask language modeling unlearning technique, which unlearns the harmful part of the text.
This method enables LLMs to selectively forget and disassociate from biased and harmful content.
Experimental results demonstrate the effectiveness of our approach in diminishing bias while maintaining the language modeling abilities.
Surprisingly, the results also unveil an unexpected potential for cross-domain transfer unlearning: debiasing in one bias form (e.g. gender) may contribute to mitigating others (e.g. race and religion).

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password