Presentation information

International Session

International Session » ES-2 Machine learning

[1S5-IS-2a] Machine learning

Tue. Jun 14, 2022 4:20 PM - 6:00 PM Room S (Online S)

Chair: Toshihiko Matsuka (Chiba University)

5:20 PM - 5:40 PM

[1S5-IS-2a-04] Network Structure based Clustering of Multiple Heterogeneous Datasets Using Metadata

〇Takeshi Sakumoto1, Teruaki Hayashi2, Hiroki Sakaji2, Hirofumi Nonaka1 (1. Nagaoka University of Technology, 2. The University of Tokyo)


Keywords:clustering, heterogeneous data, network clustering, clustering of multiple heterogeneous datasets, metadata

Recent developments of computers and data exchange platforms have increased expectations for innovation by combining data. Especially in the field of machine learning, researchers have been focusing on the combination of datasets for innovation. Most of the previous studies assume that the researchers can easily access sets of closely related datasets that have similar topics, are contextually similar, or are from the same domains. However, generally, data providers do not neccessarily design and create datasets on the premise of data exchange or merge the ones. Furthermore, the maintenance of the unified schema are not currently insufficient and the areas where they can be applied are limited. These problems make it difficult to search, discover, exchange, and utilize the datasets on data platforms where various types of inter-disciplinary data are exchanged. In this research, we propose network-based method based on not-human-readable metadata to detect clusters composed of closely related datasets from the set of various types of datasets. Experimental results on Kaggle metadata datasets demonstrate the effectiveness of our proposed methods.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.