JSAI2024

Presentation information

General Session

General Session » GS-10 AI application

[1F4-GS-10] AI application: Chemistry / Physics

Tue. May 28, 2024 3:00 PM - 4:40 PM Room F (Temporary room 4)

座長:宮川 大輝(日本電気株式会社)[[オンライン]]

4:20 PM - 4:40 PM

[1F4-GS-10-05] Small Molecular Structure Generation Through Additional Pre-Training on Large Language Models

〇Takuma Shibahara1, Yasuho Yamashita1, Tatsuya Okuno1, Takaharu Hirayama1 (1. Axcelead Drug Discovery Partners)

Keywords:Molecular structure generation, Computational chemistry, LLM

Recent advances in machine learning have generated interest in its application to drug discovery. Several models have been developed for generating molecular structures, including character-based models that encode these structures as strings, graph-based models that capture atomic bond connectivity, and 3D-based models that depict the spatial positions and bonding of atoms. This study focuses on character-based generative models which promise to interpret complex instructions regarding compound attributes through natural language, facilitating the targeted generation and refinement of molecular structures. The approach developed harnesses Large Language Models (LLMs) to create compound structures by conducting additional pre-training. The experiments involved adapting the LLaMA-2 7B model with a dataset of small molecules. The efficacy of the adapted model was compared against the JT-VAE, a graph-based generative model tailored for compounds, utilizing the MOSES benchmark for evaluation. Our findings suggest that the LLaMA-2 7B model has potential in advancing the field of drug design, as it competes and shows superiority in compound generation over the JT-VAE.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password