Functional Estimation Method for Binary Code using Large Language Models

Minami Someya

10:20 AM - 10:40 AM

[4M1-GS-10-05] Functional Estimation Method for Binary Code using Large Language Models

〇Minami Someya¹, Akira Otsuka¹ (1. Institute of Information Security)

Keywords:large language models, fine-tuning, distillation, function name prediction

Functional estimation of binary code is useful for malware analysis and vulnerability detection when analyzing programs for which source code is not available. Binary code is more difficult to understand than source code because it lacks symbolic information such as function and variable names. Although recent large language models (LLMs) have shown remarkable ability in understanding natural language and source code, their applicability to binary code has not yet been clarified. Therefore, this study aims to apply LLMs to functional inference of binary code, and tackles the task of function name prediction. We use Gemini Pro to extract the rationale for function name estimation, and then fine-tune Code Llama using the rationale and function names. Evaluation experiments showed that learning the rationale and the function name improved performance compared to fine-tuning with only the function name. Furthermore, our method outperformed Gemini Pro with Chain-of-Thought Prompting.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4M1-GS-10] AI application: Knowledge / Research

[4M1-GS-10-05] Functional Estimation Method for Binary Code using Large Language Models

Password