[3-C-1-04] Towards Diagnostic Model Learning from Multi-modal Datasets
Deep Learning, Diagnostic Model, Multi-modal data
Creating a diagnosis system that is close to human capabilities has been set as a goal in the field of AI medical applications. Research in this area began in the 1970s, leading to the development of various expert systems. However, various issues such as the system's inability to learn from failures were identified. In pursuit of solutions to these problems, diverse fields of AI research emerged. Among these developments, the construction of diagnostic models from conventional text data began in the 1990s using machine learning techniques. This allowed the derivation of rules similar to those used in rule-based expert systems. Subsequently, in the 2000s, kernel machines including support vector machines emerged, particularly enhancing the accuracy of classifiers for image data. Moving into the late 2010s, deep learning frameworks enabled the creation of highly accurate classification systems for various media types, including not only images but also audio and natural language. Clinical applications of image-based diagnostics also progressed. Deep learning frameworks have since evolved significantly, enabling the development of advanced capabilities in natural language processing, such as advanced translation and conversational response systems. However, considering the context of medical problem-solving, accurate diagnoses and treatments are performed by utilizing information from various media. Therefore, real clinical diagnostic systems require learning from multi-modal datasets. In the 2020s, discussions have arisen regarding methods for processing information related to multi-modal learning. In this panel, the current status of technical challenges related to learning from multi-modal data for the development of automated diagnosis systems is presented, and future issues are discussed.