3:45 PM - 4:00 PM
[MGI31-08] Improve data discovery and metadata interoperability via publishing structured data on the Web
★Invited Papers
Keywords:Data discovery, Publising structured data , Metadata interoperability, Schema.org vocabulary, FAIR data, Linked data
The Web provides a global platform for discovering data. One of the current uses of the Web as a data discovery platform relies on web-based data repositories publishing and presenting of metadata as part of websites landing pages. Such a presentation is friendly for human users to read, but not be easily understood by search engines and, in general, by machines. For machines to correctly interpret and process the meaning of metadata beyond a bag of words, we need to mark up metadata with a common vocabulary as well as in a machine-processable encoding. This structured markup makes possible both semantic and syntactic interoperability on the Web.
In the past years, Schema.org has become a vocabulary commonly used by websites to describe their content and expose the corresponding structured metadata so search engines can better interpret the meaning and data searchers can benefit from more accurate results. Schema.org was originally intended for use in e-commerce applications, but nowadays is also used by libraries around the world to publish bibliography information supporting Linked Data (Godby et al. 2015). Some data repositories, for example NASA and NOAA, have already adopted this approach. There are also communities (e.g. the Schema.org Cluster of the Earth Science Information Partners[2]) provide guidelines to their communities for supporting the consistent implementation of the Schema.org markups. When more repositories adopt this approach, the research data community can take advantage of such enhanced metadata interoperability; for instance, data aggregators can explore new methods for metadata syndication via the web architecture. If implemented consistently, structured data can lead to linked metadata, which will enable smart web data discovery applications to perform to their potential.
The Research Data Alliance Research Metadata Schema Working Group[3] was formed with the purpose to exchange experience and lessons learned from publishing structured metadata and to have consistent implementation of the publishing process across repositories. In this talk, we will present the guidelines with ten recommendations for publishing the structured data on the Web. The recommendations cover high level strategy and community engagement, semantical level interoperability, to consistent syntactic serialisation.
References:
Godby, C. J., Wang, S. and Mixter, J. K. (2015). Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description. Available from: https://doi.org/10.2200/S00620ED1V01Y201412WBE012
Guha, V., Brickley, D., and Macbeth, S. “Schema.org: Evolution of structured data on the Web: Big data makes common schemas even more necessary”. Query, November 2015, https://doi.org/10.1145/2857274.2857276
[1] https://blog.datacite.org/german-research-foundation-to-fund-new-services-of-re3data/
[2] The ESIP Schema.org Cluster: https://wiki.esipfed.org/Schema.org_Cluster
[3] Research Data Alliance Research Metadata Schemas Working Group: https://www.rd-alliance.org/groups/research-metadata-schemas-wg