11:30 〜 11:45
▲ [24a-E203-10] From Automatically-Extracted Database Toward Semi-Supervised Curation
キーワード:materials informatics, superconductors, data mining
The automatic collection of materials information from large scale research articles is the necessary component for rapid material discovery using materials informatics (MI). We are working to create a new automatically extracted database of superconductors materials.
However, after performing manual corrections of a subset of records, we found out that a) the correction is time consuming and uninspiring, b) the original PDFs is not always enough to collect all information (e.g. might be needed to check cited papers) c) the use of general purposes tools, such as Excel created a fragmentation in the data workflow, and d) challenging to reuse the corrected data to improve the underlying system.In this work, we present our solution to improve the aforementioned aspects. We propose our archi- tecture composed by a new front-end interface and articulated over two workflows (Figure 1): a) record flagging, and b) record correction.
However, after performing manual corrections of a subset of records, we found out that a) the correction is time consuming and uninspiring, b) the original PDFs is not always enough to collect all information (e.g. might be needed to check cited papers) c) the use of general purposes tools, such as Excel created a fragmentation in the data workflow, and d) challenging to reuse the corrected data to improve the underlying system.In this work, we present our solution to improve the aforementioned aspects. We propose our archi- tecture composed by a new front-end interface and articulated over two workflows (Figure 1): a) record flagging, and b) record correction.