Scalar metrics for evaluating the scientific skill of Earth System Models

Marta Alerany Sole; Kai Keller; Leo Arriola; Mario Acosta

2:00 PM - 2:15 PM

[AAS05-02] Scalar metrics for evaluating the scientific skill of Earth System Models

★Invited Papers

*Marta Alerany Sole¹, Kai Keller¹, Leo Arriola¹, Mario Acosta¹ (1.Barcelona Supercomputing Center)

Keywords:Climate models, Scientific skill, Performance assessment

Coupled Earth System Models (ESMs) play a key role in understanding the impact of natural and anthropogenic drivers on climate change, which is essential for assessing climate risks and advising policymakers and governments in making informed decisions. To serve as the scientific basis for public policy and to attribute observed changes to specific drivers, ESMs need to provide high confidence in future climate projections. Part of this confidence stems from their scientific model skill, which refers to the ability of the model to accurately represent and predict various aspects of the climate system, including its capacity to forecast future climate changes or to capture complex patterns and relationships within the system.

Currently, modeling centers rely on custom metrics and testing protocols to assess the projection skill of climate models [1]. Those often focus on specific phenomena like El Niño-Southern Oscillation (ENSO) [2] or precipitation [3], while others employ general performance indices and interpretable diagnostics such as those found in the AQUA suite [4]. We usually rely on charts like time series of globally averaged fields or spatial bias plots, which are susceptible to subjective and biased interpretation, which makes comparing different models or model versions challenging. The abundance of different suites and custom diagnostics highlights the need to develop a rigorous methodology for assessing and comparing the existing metrics, identifying redundancies, selecting a minimal set, and combining them into a single workflow with a generic interface for the objective evaluation of climate projections. Within the HANAMI project, our research focuses on defining a robust, reliable set of scalar metrics to assess the scientific skill of climate models. Similar to CPMIP, which introduced platform- and model-independent metrics related to computational performance [5], we provide an initial set of widely applicable, efficient, and scientifically relevant metrics, which we aim to refine to ultimately enable researchers to easily share and contrast results across diverse models, HPC platforms, and locations. Furthermore, we present an evaluation of the selected metrics performed on real climate data produced by a coupled ESM.

[1] G. Flato et al., 2013, https://www.ipcc.ch/site/assets/uploads/2018/02/WG1AR5_Chapter09_FINAL.pdf
[2] Y.Y. Planton et al., 2021, https://doi.org/10.1175/BAMS-D-19-0337.1
[3] M.-S. Ahn et al., 2023, https://doi.org/10.5194/gmd-16-3927-2023
[4] M. Nurisso et al., 2024, https://github.com/DestinE-Climate-DT/AQUA
[5] V. Balaji et al., 2017, https://doi.org/10.5194/gmd-10-19-2017

Presentation information

[A-AS05] Weather, Climate, and Environmental Science Studies using High-Performance Computing

[AAS05-02] Scalar metrics for evaluating the scientific skill of Earth System Models

★Invited Papers