JSAI2024

Presentation information

General Session

General Session » GS-3 Knowledge utilization and sharing

[4F1-GS-3] Knowledge utilization and sharing:

Fri. May 31, 2024 9:00 AM - 10:40 AM Room F (Temporary room 4)

10:20 AM - 10:40 AM

[4F1-GS-3-05] Estimation of Disease Model Organisms Based on Phenotypic Data Similarity

〇Tatsuya Kushida1, Jae-Moon Shin2, Daiki Usuda1, Yuki Yamagata1, Toyoyuki Takada1, Norio Kobayashi1, Toyofumi Fujiwara2, Hiroshi Masuya1 (1. RIKEN, 2. Database Center for Life Science)

Keywords:Bioinformatics, Knowledge graph, Data integration, Disease, TF-IDF

Disease model organisms, either genetically modified or spontaneously mutated, are designed to simulate specific diseases. These models are indispensable for uncovering disease mechanisms, advancing drug discovery, and developing treatments. Using gene-disease association data, we have identified 1938 potential disease model mice for 1208 diseases from a pool of 7828 experimental mice. However, this method fails to identify models for diseases with unknown genetic bases. In this study, we sought novel disease model mice by calculating set similarities (e.g., Jaccard index) and TF-IDF cosine similarities using phenotypic data (e.g., urticaria) annotated in 4438 experimental mice, combined with disease-phenotype interaction data. We then constructed a knowledge graph from these results, enabling SPARQL query execution. Consequently, we discovered 239 disease model mouse candidates related to 203 diseases, among which 179 candidates for 189 diseases (e.g., eosinophilopenia) were novel.