10:00 AM - 10:20 AM
[3G1-GS-11-04] Creating Japanese VIrtue Dataset for AI Safety
Keywords:Natural Language Processing, AI Safety, VIrtue Ethics, AI Alignment
Some AI models, such as large language models (LLMs), are known to generate harmful content for humans. AI researchers conduct AI alignment research to ensure that AI models understand our ethics and behave appropriately. However, most of these studies are conducted in English, with few studies in Japanese. Thus, this study creates a dataset for AI safety based on virtue ethics, a major stance in normative ethics. We create a new dataset in Japanese using the same construction method as that used to create the existing English virtue ethics dataset. The created dataset consists of approximately 20,000 cases, and we evaluate whether the AI model can correctly classify the correspondence between sentences describing an action and the character trait terms describing that action. We experimented with existing Japanese LLMs and found that it is difficult for these models to classify the correspondence correctly. We also compared our dataset with an existing English virtue ethics dataset.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.