JSAI2025

Presentation information

Organized Session

Organized Session » OS-32

[3L6-OS-32] OS-32

Thu. May 29, 2025 5:40 PM - 7:20 PM Room L (Room 1007)

オーガナイザ:高槻 瞭大(AIアライメントネットワーク/東京大学),峰岸 剛基(東京大学),宮西 洋輔(サイバーエージェント/北陸先端科学技術大学院大学),高木 優(国立情報学研究所)

7:00 PM - 7:20 PM

[3L6-OS-32-05] Investigating Gender Bias in Multilingual Large Language Models Using Sparse Auto-Encoders

〇Tota Abe1, Namgi Han1, Yusuke Miyao1 (1. Univ. of Tokyo)

Keywords:Sparse Auto-Encoder, Gender Bias, Large Language Model, Mechanistic Interpretability

This research investigates how multilingual Large Language Models (LLMs) encode gender biases in English and Japanese.
It is plausible that gender biases appear differently according to the language in which we train LLMs.
However, it remains to be discovered how multilingual LLMs learn and encode gender biases for different languages.
We extract gender bias features for multiple languages using Sparse Auto-Encoders (SAEs) and see if the features are identical among languages.
More specifically, we give multilingual LLMs gender-stereotypical and anti-gender-stereotypical texts.
We extract interpretable features from neurons in the inner layers of LLMs using SAEs and look for the features that fire differently between the two texts.
Then, we compare the feature activations between the English and Japanese cases.
The experimental results indicate that gender bias is encoded in the distinct parts of multilingual LLMs according to the languages.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password