Addressing Prompt Injection through Iterative Interactions among LLM Agents

Go Sato

12:40 PM - 1:00 PM

[4I2-GS-11-03] Addressing Prompt Injection through Iterative Interactions among LLM Agents

〇Go Sato¹, Ryohei Orihara¹, Yasuyuki Tahara¹, Akihiko Ohsuga¹, Yuichi Sei¹ (1. The University of Electro-Communications)

Keywords:Large Language Model, Multi Agent, AI Ethics

In recent years, as the demand for Large Language Models (LLMs) has increased, prompt injection attacks have become a serious security concern. Numerous studies have been conducted to resolve this problem. However, the lack of datasets and the growing variety of attack methods have led to a decline in generalizability. To address this issue, in this study, we construct two teams of multiple LLM agents: one for generating prompts that induce prompt injection and another for evaluating their harmfulness. Through iterative prompt generation and evaluation between these teams, we aim to develop countermeasures against a diverse range of attacks. As a result, our approach demonstrated higher accuracy in evaluating prompt harmfulness compared to the baseline model.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4I2-GS-11] AI and Society:

[4I2-GS-11-03] Addressing Prompt Injection through Iterative Interactions among LLM Agents

Password