JSAI2025

Presentation information

General Session

General Session » GS-11 AI and Society

[2H4-GS-11] AI and Society:

Wed. May 28, 2025 1:40 PM - 3:20 PM Room H (Room 1003)

座長:篠田 一聡(NTT)[[オンライン]]

2:20 PM - 2:40 PM

[2H4-GS-11-03] An analysis of the safety of a Japanese-based LLM against stereotypical prompts

Akito Nakanishi1, 〇Yukie Sano1, Geng Liu2, Francesco Pierri2 (1. University of Tsukuba, 2. Politecnico di Milano)

Keywords:Large Language Model, Stereotype, Japanese-based LLM, Toxicity Analysis, Sentiment Analysis

As large language models (LLMs) gain increasing attention, concerns have also been raised about stereotypical outputs and underlying social biases. While extensive research has been conducted on English-based LLMs, studies on Japanese models remain limited. This study examines the safety of Japanese LLMs in responding to stereotype-triggering prompts. We constructed 3,612 prompts by combining 301 social groups with 12 stereotype-inducing templates in Japanese and conducted three tasks using models trained on Japanese, English, and Chinese. Our findings show that LLM-jp had the lowest refusal rate and was more likely to generate toxic and negative responses compared to other models. Additionally, prompt format significantly influenced all models, and the generated responses included exaggerated reactions toward specific social groups, varying across models. These results highlight the need to improve safety mechanisms in Japanese LLMs and contribute to discussions on bias mitigation and their safe and responsible deployment.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password