11:00 AM - 11:15 AM
▲ [11a-N107-8] Efficient workflow for automatic database creation from large scale scientific articles
Keywords:material informatics, superconductors, machine learning
The creation of automatically extracted databases of materials and properties from the scientific literature is the building block for data-driven materials science (Materials Informatics).
We have developed a service combining Apache Airflow (workflow engine) and custom made tasks to use our superconductors extractor service for build- ing a database of superconductors materials and properties from PDF documents of scientific articles.
We have processed about 250000 PDF documents from journals in materials science from various publishers, including APS, IOP and Elsevier and obtained a database of nearly 12000 entries of superconductors materials and linked properties.
We have developed a service combining Apache Airflow (workflow engine) and custom made tasks to use our superconductors extractor service for build- ing a database of superconductors materials and properties from PDF documents of scientific articles.
We have processed about 250000 PDF documents from journals in materials science from various publishers, including APS, IOP and Elsevier and obtained a database of nearly 12000 entries of superconductors materials and linked properties.