How to make the data sets in “dark long tail” open and preserve?

家森 俊彦

講演情報

インターナショナルセッション（口頭発表）

セッション記号 M (領域外・複数領域) » M-GI 地球科学一般・情報地球科学

[M-GI04] Open Research Data and Interoperable Science Infrastructures for Earth & Planetary Sciences

2016年5月23日(月) 09:00 〜 10:30 A02 (アパホテル&リゾート東京ベイ幕張)

コンビーナ:*村山泰啓(国立研究開発法人情報通信研究機構統合データシステム研究開発室)、Cecconi Baptiste(LESIA, Observatoire de Paris, CNRS, PSL Research University)、近藤康久(総合地球環境学研究所)、石井励一郎(海洋研究開発機構)、Crichon Daniel(Jet Propulsion Laboratory, National Aeronautics and Space Administration)、座長:近藤康久、村山泰啓(国立研究開発法人情報通信研究機構統合データシステム研究開発室)

10:15 〜 10:30

[MGI04-06] How to make the data sets in “dark long tail” open and preserve?

*家森俊彦¹ (1.京都大学大学院理学研究科付属地磁気世界資料解析センター)

キーワード：open data, data preservation, small data set

In data analysis, we often encounter the difficulty by lack of data and try to find additional data sets by asking researchers in the same research community. Sometimes, we can reach the data set suitable to fill the gap of data or we find unexpected data set which is very useful. However, in most cases, we cannot find the data. We know that there are a huge number of datasets—mainly obtained on a research project basis—that are not registered to active data centres, and hence are 'dark' to many of us. These datasets are typically built by small research groups for a limited period, and data are not open for public. Although they exist only for a limited period, such data are very important and useful if the location of observation site is highly unique, or if other observations are not available.
One way to make such data sets open from the 'dark long tail' is to register metadata that describe the observations in as much detail as possible. An example of this in practice is IUGONET (Interuniversity Upper atmosphere Global Observation NETwork), which has a common database of metadata and forms a virtual data centre of distributed databases at several institutions. This data system includes databases from the 'dark long tail', as well as large well-known databases.
Another way is to use university repositories. However, in this case, we need a common method to find and retrieve the data set.