15:30 〜 15:45
[AAS02-07] Data access for km-scale resolution models
★Invited Papers
キーワード:Climate Modeling, Data, Workflows, High-resolution, km-scale
With the transition to global, km-scale simulations, model outputs have grown in size, and efficient ways of accessing data have become more important than ever. This implies that the data storage has to be optimized for efficient read access to small sub-sets of the data, and multiple resolutions of the same dataset need to be provided for efficient analysis on coarse as well as fine-grained scales.
We present an approach based on datasets. Each dataset represents a coherent subset of a model output (e.g. all model variables stored at daily resolution). Aiming for a minimum number of datasets drives us to enforce consistency in the model output and thus ease analysis. Each dataset is served to the user as one zarr store, independent of the actual file layout on disks or other storage media. Multiple datasets are grouped in catalogs for findability.
By serving the data via https, we implement a middle layer between the user and the storage systems, allowing to combine different storage backends behind a unifying frontend. At the same time, this approach allows us to largely build the system on existing technologies such as web servers and caches, and efficiently serve data to users outside DKRZ.
The approach we present is currently under development in the WarmWorld, nextGEMS and EERIE projects, and we expect it to be useful for many other projects as well.
We present an approach based on datasets. Each dataset represents a coherent subset of a model output (e.g. all model variables stored at daily resolution). Aiming for a minimum number of datasets drives us to enforce consistency in the model output and thus ease analysis. Each dataset is served to the user as one zarr store, independent of the actual file layout on disks or other storage media. Multiple datasets are grouped in catalogs for findability.
By serving the data via https, we implement a middle layer between the user and the storage systems, allowing to combine different storage backends behind a unifying frontend. At the same time, this approach allows us to largely build the system on existing technologies such as web servers and caches, and efficiently serve data to users outside DKRZ.
The approach we present is currently under development in the WarmWorld, nextGEMS and EERIE projects, and we expect it to be useful for many other projects as well.
