2:20 PM - 2:40 PM
[4A3-GS-10-02] Development of Prompt Attack Data Collection Application for LLMs and Analysis of Collected Data Characteristics
[[Online]]
Keywords:AI Safety, Jailbreak, Prompt Injection
As Large Language Models become increasingly widespread, countermeasures against attack methods such as jailbreaking and prompt injection have become an urgent issue. Existing defense methods like Safeguard Models, including Llama Guard, have been found to perform inadequately against attacks in Japanese. In this research, we developed AILBREAK, an attack dataset collection application utilizing gamification to enhance LLM defense capabilities against Japanese-language prompt attacks. This application implements a mechanism that collects manually crafted attack prompts from users, featuring stages composed of various challenges based on safety categories, such as extracting passwords from enemies through battle-game elements. The design achieves both educational impact and data collection objectives. The collected dataset will be made publicly available for improving LLM defense functions and developing Japanese language-specific Safeguard Models. This paper reports on the application design, data collection methodology, and characteristics of the collected data.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.