3:40 PM - 4:00 PM
[1P4-OS-1b-01] Dataset Embedding Using Contextual Information: Application and Evaluation for Data Integration Tasks
Keywords:data integration, schema matching, embeddings, deep learning
In data integration, tasks such as schema matching are crucial yet carry high costs, necessitating various automation efforts. This study introduces an embedding-based approach that integrates contextual information to improve matching performance in both practical and business datasets. Building upon EmbDI, we construct a quadruple graph by leveraging column descriptions embedded via Sentence-BERT, generating edges based on column similarity. Experiments on nine diverse datasets reveal that our method surpasses EmbDI, particularly when dealing with numerous numerical columns and complex schema structures. Our findings indicate that embedding-based learning enriched with contextual metadata is vital for high-precision schema matching and contributes to the realization of more accurate data integration. These results suggest the importance of considering contextual metadata in real-world data integration environments.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.