10:00 AM - 10:20 AM
[4L1-GS-10-04] Development and Evaluation of a Sentence Embedding Model for Drug Information Retrieval
Keywords:Drug Information, Information Retrieval, Sentence Embedding, Retrieval Augmented Generation
The development of large language models (LLMs) has advanced chatbot technology, but they often produce inaccurate responses in specialized areas like drug information. To address this, retrieval augmented generation, which uses external databases, has been explored. This study aimed to create a sentence embedding model specifically for drug information, improving retrieval results. The evaluation dataset was created using Experts matched these questions with corresponding queries in the QA collection, using these expert-linked associations as the gold standard for evaluating search accuracy. To further refine the model, we created question pairs with varying Jaccard similarity coefficients using GPT-3.5, assessing their semantic similarity. These pairs and the QA collection, were used for additional training, enhancing the model's retrieval capabilities. The top-5 accuracy improved from 82.5% pre-training to 93.5% post-training. This indicates the potential of specialized sentence embedding models for specialized knowledge domains.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.