4:20 PM - 4:40 PM
[1F4-GS-10-05] Small Molecular Structure Generation Through Additional Pre-Training on Large Language Models
Keywords:Molecular structure generation, Computational chemistry, LLM
Recent advances in machine learning have generated interest in its application to drug discovery. Several models have been developed for generating molecular structures, including character-based models that encode these structures as strings, graph-based models that capture atomic bond connectivity, and 3D-based models that depict the spatial positions and bonding of atoms. This study focuses on character-based generative models which promise to interpret complex instructions regarding compound attributes through natural language, facilitating the targeted generation and refinement of molecular structures. The approach developed harnesses Large Language Models (LLMs) to create compound structures by conducting additional pre-training. The experiments involved adapting the LLaMA-2 7B model with a dataset of small molecules. The efficacy of the adapted model was compared against the JT-VAE, a graph-based generative model tailored for compounds, utilizing the MOSES benchmark for evaluation. Our findings suggest that the LLaMA-2 7B model has potential in advancing the field of drug design, as it competes and shows superiority in compound generation over the JT-VAE.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.