12:40 PM - 1:00 PM
[4I2-GS-11-03] Addressing Prompt Injection through Iterative Interactions among LLM Agents
Keywords:Large Language Model, Multi Agent, AI Ethics
In recent years, as the demand for Large Language Models (LLMs) has increased, prompt injection attacks have become a serious security concern. Numerous studies have been conducted to resolve this problem. However, the lack of datasets and the growing variety of attack methods have led to a decline in generalizability. To address this issue, in this study, we construct two teams of multiple LLM agents: one for generating prompts that induce prompt injection and another for evaluating their harmfulness. Through iterative prompt generation and evaluation between these teams, we aim to develop countermeasures against a diverse range of attacks. As a result, our approach demonstrated higher accuracy in evaluating prompt harmfulness compared to the baseline model.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.