JSAI2024

Presentation information

General Session

General Session » GS-10 AI application

[4M1-GS-10] AI application: Knowledge / Research

Fri. May 31, 2024 9:00 AM - 10:40 AM Room M (Room 53)

座長:宇野 裕(日本電気株式会社)

10:20 AM - 10:40 AM

[4M1-GS-10-05] Functional Estimation Method for Binary Code using Large Language Models

〇Minami Someya1, Akira Otsuka1 (1. Institute of Information Security)

Keywords:large language models, fine-tuning, distillation, function name prediction

Functional estimation of binary code is useful for malware analysis and vulnerability detection when analyzing programs for which source code is not available. Binary code is more difficult to understand than source code because it lacks symbolic information such as function and variable names. Although recent large language models (LLMs) have shown remarkable ability in understanding natural language and source code, their applicability to binary code has not yet been clarified. Therefore, this study aims to apply LLMs to functional inference of binary code, and tackles the task of function name prediction. We use Gemini Pro to extract the rationale for function name estimation, and then fine-tune Code Llama using the rationale and function names. Evaluation experiments showed that learning the rationale and the function name improved performance compared to fine-tuning with only the function name. Furthermore, our method outperformed Gemini Pro with Chain-of-Thought Prompting.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password