[2P-34] PDBbindデータベースを用いた教師あり機械学習による蛋白質-リガンド相互作用予測手法の開発
With the improvement of computational power, machine learning technology has been used in every part of drug discovery. Since there is a huge increase in a publicly available large database such as the protein data bank in recent years, the molecular docking method is also becoming popular in various fields such as Computer-Aided Drug Design.Our group previously reported comprehensive classification of protein-small ligand interactions with an unsupervised parametric pattern recognition technique based on the Gaussian mixture model (Kasahara et al., 2013). Here, we applied this technique to a development of a new knowledge-based docking method. 4,565 protein-ligand complexes were extracted from a dataset called "refined set" in the PDBBind database (released in Dec 2019) for statistical analyses. This dataset has been clustered with a 70% identity threshold of protein sequence homology. Also, for each clustered family, a representative was used to consist of 1,155 entries of the non-redundant dataset. In the study, supervised classifiers using neural network algorithm, support vector machine, random forest, and XGBoost have been developed for distinguishing between a native structure and a decoy structure.