Japan Geoscience Union Meeting 2022

Presentation information

[E] Oral

M (Multidisciplinary and Interdisciplinary) » M-GI General Geosciences, Information Geosciences & Simulations

[M-GI30] Open Science with FAIR Science Data Sharing and Management and e-Infrastructures

Tue. May 24, 2022 3:30 PM - 5:00 PM 201A (International Conference Hall, Makuhari Messe)

convener:Yasuhiro Murayama(NICT Knowldge Hub, National Institute of Information and Communications Technology), convener:Baptiste Cecconi(LESIA, Observatoire de Paris, CNRS, PSL Research University), Yasuhisa Kondo(Research Institute for Humanity and Nature), convener:Shelley Stall(American Geophysical Union), Chairperson:Yasuhiro Murayama(NICT National Institute of Information and Communications Technology), Baptiste Cecconi(LESIA, Observatoire de Paris, CNRS, PSL Research University)

3:30 PM - 3:45 PM

[MGI30-07] Experiences of migrating Environmental Data Science research to Virtual Labs

*Maria Salama1,3, Gordon S. Blair2,1,3 (1.Lancaster University, UK, 2.UK Centre for Ecology and Hydrology (UKCEH), UK, 3.Centre of Excellence for Environmental Data Science (CEEDS), UK)

Keywords:Virtual labs, environmental data science, DataLabs, open science, eScience, virtual research environments

Environmental data science research is typically transdisciplinary in nature, with scientists, practitioners, and stakeholders creating data-driven solutions to different environmental challenges, often using a large amount of highly heterogeneous data along with complex analytical methods. The concept of ‘virtual labs’ allows collaborating scientists to explore datasets, develop new methods, apply models to different datasets, as well as communicate the results to stakeholders, practitioners, and decision-makers across different scales (local, regional, or national).
DataLabs, a platform realising the concept of virtual labs, is a cloud-based virtual research environment – in continuous development by UKCEH using an agile approach – that advocates open and collaborative science by providing the infrastructure and software tools to support end-to-end analysis from the assimilation and analysis of data through to the visualisation, interpretation, and discussion of the results. The architecture of DataLabs is designed as service-oriented, allowing the selection of the appropriate technology for each component. The HPC computing resources are provided by JASMIN, while data storage is available to all systems through shared block storage (NFS cluster) and object storage (QuoBye S3). DataLabs main components include distributed computing services, analytical tools, analytics execution engines, narrative computing tools, publishing tools, as well as data management.
In the Data Science of the Natural Environment (DSNE) project, we are studying the current experiences, barriers and opportunities associated with virtual labs, as well as the requirements for future developments and extensions. Within the DSNE project, a transdisciplinary team of environmental scientists, statisticians, computer scientists and social scientists are collaborating to develop statistical/data science methods for environmental grand challenges. The research done by the group of DSNE researchers is currently being migrated to DataLabs to bring the developed methods to users of different areas of expertise (scientists, stakeholders, policy-makers, and the public) interested in environmental science into one virtual space to tackle environmental problems and catering for users at different levels of abstraction.
The migration process involves researchers from different disciplines, with different backgrounds and previous experiences of virtual labs, as well as different expertise levels of using the DataLabs platform. In parallel with the migration process, we are studying the experiences of the researchers, while mitigating the associated technical challenges. The study includes semi-structured interviews with each of the researchers, before and after the migration process. The first one is for understanding their requirements, challenges, and discussing their outlook to use DataLabs, while the second is set for discussing their experience after using DataLabs and arising requirements for future development.
While the prospect to DataLabs was generally positive, we have found a variety of expectations and technical requirements raised from the first set of interviews. Working using their own setup before the migration, researchers expected DataLabs to support the ability to import pre-configured software containers. Also, dealing with big data from multiple sources, accessing external data storage from within DataLabs was required. We have also observed some cultural issues influencing the researchers’ outlook on DataLabs. As the case of emerging platforms, a cultural shift is required to move from local facilities or own setup to a shared environment. The character of the researcher is another influencing factor; the higher technical skills the stronger preference to control the bottom components in the architecture. The experiences learned, so far, represent the first steps in the codification of best practices in virtual labs design and development.