JSAI2025

Presentation information

Organized Session

Organized Session » OS-1

[1P4-OS-1b] OS-1

Tue. May 27, 2025 3:40 PM - 5:20 PM Room P (Room 801-2)

オーガナイザ:鈴木 健二(ソニーグループ),原 聡(電気通信大学),谷中 瞳(東京大学),菅原 朔(国立情報学研究所)

3:40 PM - 4:00 PM

[1P4-OS-1b-01] Dataset Embedding Using Contextual Information: Application and Evaluation for Data Integration Tasks

〇Yuka Haruki1, Shigeru Ishikura2, Kazuya Demachi2, Teruaki Hayashi1 (1. The University of Tokyo, 2. Infomart Corporation)

Keywords:data integration, schema matching, embeddings, deep learning

In data integration, tasks such as schema matching are crucial yet carry high costs, necessitating various automation efforts. This study introduces an embedding-based approach that integrates contextual information to improve matching performance in both practical and business datasets. Building upon EmbDI, we construct a quadruple graph by leveraging column descriptions embedded via Sentence-BERT, generating edges based on column similarity. Experiments on nine diverse datasets reveal that our method surpasses EmbDI, particularly when dealing with numerous numerical columns and complex schema structures. Our findings indicate that embedding-based learning enriched with contextual metadata is vital for high-precision schema matching and contributes to the realization of more accurate data integration. These results suggest the importance of considering contextual metadata in real-world data integration environments.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password