<strong>Improve data discovery and metadata interoperability via publishing structured data on the Web</strong>

Mingfang Wu, Leyla Jael Castro, Adam Shepherd

3:45 PM - 4:00 PM

[MGI31-08] Improve data discovery and metadata interoperability via publishing structured data on the Web

★Invited Papers

*Mingfang Wu¹, Leyla Jael Castro², Adam Shepherd³ (1.Australian Research Data Commons, Australia, 2.ZB MED Information Centre for Life Sciences, Germany, 3.Woods Hole Oceanographic Institution, US)

Keywords:Data discovery, Publising structured data , Metadata interoperability, Schema.org vocabulary, FAIR data, Linked data

Over the past decade, it is becoming a common scientific practice of sharing data through increasing number of public and domain specific data repositories, for improving research reproducibility as well as aligning to Open Science initiatives. For example, the Registry of Research Data Repositories, had 23 data repositories in 2012; the number quickly increased to over 1,200 from across the globe in 2015, and the registry had more than 2450 repositories^{^[1]} by 2020. While data sharing via data repositories is highly welcomed by the scientific community, it becomes ever more challenging to discover relevant data, especially when required data comes from several repositories. In addition, data aggregators are required to deal with harmonising metadata from a number of sources using a variety of metadata schemas.

The Web provides a global platform for discovering data. One of the current uses of the Web as a data discovery platform relies on web-based data repositories publishing and presenting of metadata as part of websites landing pages. Such a presentation is friendly for human users to read, but not be easily understood by search engines and, in general, by machines. For machines to correctly interpret and process the meaning of metadata beyond a bag of words, we need to mark up metadata with a common vocabulary as well as in a machine-processable encoding. This structured markup makes possible both semantic and syntactic interoperability on the Web.

In the past years, Schema.org has become a vocabulary commonly used by websites to describe their content and expose the corresponding structured metadata so search engines can better interpret the meaning and data searchers can benefit from more accurate results. Schema.org was originally intended for use in e-commerce applications, but nowadays is also used by libraries around the world to publish bibliography information supporting Linked Data (Godby et al. 2015). Some data repositories, for example NASA and NOAA, have already adopted this approach. There are also communities (e.g. the Schema.org Cluster of the Earth Science Information Partners^{^[2]}) provide guidelines to their communities for supporting the consistent implementation of the Schema.org markups. When more repositories adopt this approach, the research data community can take advantage of such enhanced metadata interoperability; for instance, data aggregators can explore new methods for metadata syndication via the web architecture. If implemented consistently, structured data can lead to linked metadata, which will enable smart web data discovery applications to perform to their potential.

The Research Data Alliance Research Metadata Schema Working Group[3] was formed with the purpose to exchange experience and lessons learned from publishing structured metadata and to have consistent implementation of the publishing process across repositories. In this talk, we will present the guidelines with ten recommendations for publishing the structured data on the Web. The recommendations cover high level strategy and community engagement, semantical level interoperability, to consistent syntactic serialisation.

References:

Godby, C. J., Wang, S. and Mixter, J. K. (2015). Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description. Available from: https://doi.org/10.2200/S00620ED1V01Y201412WBE012

Guha, V., Brickley, D., and Macbeth, S. “Schema.org: Evolution of structured data on the Web: Big data makes common schemas even more necessary”. Query, November 2015, https://doi.org/10.1145/2857274.2857276

^{^[1]} https://blog.datacite.org/german-research-foundation-to-fund-new-services-of-re3data/

^{^[2]} The ESIP Schema.org Cluster: https://wiki.esipfed.org/Schema.org_Cluster

[3] Research Data Alliance Research Metadata Schemas Working Group: https://www.rd-alliance.org/groups/research-metadata-schemas-wg

Presentation information

[M-GI31] Open and FAIR Science: Data Sharing, e-Infrastructure, Data Citation and Reproducibility

[MGI31-08] Improve data discovery and metadata interoperability via publishing structured data on the Web

★Invited Papers