3:30 PM - 3:45 PM
[MGI30-07] Experiences of migrating Environmental Data Science research to Virtual Labs
Keywords:Virtual labs, environmental data science, DataLabs, open science, eScience, virtual research environments
DataLabs, a platform realising the concept of virtual labs, is a cloud-based virtual research environment – in continuous development by UKCEH using an agile approach – that advocates open and collaborative science by providing the infrastructure and software tools to support end-to-end analysis from the assimilation and analysis of data through to the visualisation, interpretation, and discussion of the results. The architecture of DataLabs is designed as service-oriented, allowing the selection of the appropriate technology for each component. The HPC computing resources are provided by JASMIN, while data storage is available to all systems through shared block storage (NFS cluster) and object storage (QuoBye S3). DataLabs main components include distributed computing services, analytical tools, analytics execution engines, narrative computing tools, publishing tools, as well as data management.
In the Data Science of the Natural Environment (DSNE) project, we are studying the current experiences, barriers and opportunities associated with virtual labs, as well as the requirements for future developments and extensions. Within the DSNE project, a transdisciplinary team of environmental scientists, statisticians, computer scientists and social scientists are collaborating to develop statistical/data science methods for environmental grand challenges. The research done by the group of DSNE researchers is currently being migrated to DataLabs to bring the developed methods to users of different areas of expertise (scientists, stakeholders, policy-makers, and the public) interested in environmental science into one virtual space to tackle environmental problems and catering for users at different levels of abstraction.
The migration process involves researchers from different disciplines, with different backgrounds and previous experiences of virtual labs, as well as different expertise levels of using the DataLabs platform. In parallel with the migration process, we are studying the experiences of the researchers, while mitigating the associated technical challenges. The study includes semi-structured interviews with each of the researchers, before and after the migration process. The first one is for understanding their requirements, challenges, and discussing their outlook to use DataLabs, while the second is set for discussing their experience after using DataLabs and arising requirements for future development.
While the prospect to DataLabs was generally positive, we have found a variety of expectations and technical requirements raised from the first set of interviews. Working using their own setup before the migration, researchers expected DataLabs to support the ability to import pre-configured software containers. Also, dealing with big data from multiple sources, accessing external data storage from within DataLabs was required. We have also observed some cultural issues influencing the researchers’ outlook on DataLabs. As the case of emerging platforms, a cultural shift is required to move from local facilities or own setup to a shared environment. The character of the researcher is another influencing factor; the higher technical skills the stronger preference to control the bottom components in the architecture. The experiences learned, so far, represent the first steps in the codification of best practices in virtual labs design and development.