Japan Geoscience Union Meeting 2025

Presentation information

[E] Oral

A (Atmospheric and Hydrospheric Sciences ) » A-CG Complex & General

[A-CG38] Climate Variability and Predictability on Subseasonal to Centennial Timescales

Wed. May 28, 2025 9:00 AM - 10:30 AM 101 (International Conference Hall, Makuhari Messe)

convener:Takahito Kataoka(JAMSTEC Japan Agency for Marine-Earth Science and Technology), Hiroyuki Murakami(Geophysical Fluid Dynamics Laboratory), Yushi Morioka(Japan Agency for Marine-Earth Science and Technology), Nathaniel C Johnson(NOAA Geophysical Fluid Dynamics Laboratory), Chairperson:Takahito Kataoka(JAMSTEC Japan Agency for Marine-Earth Science and Technology), Hiroyuki Murakami(Geophysical Fluid Dynamics Laboratory), Yushi Morioka(Japan Agency for Marine-Earth Science and Technology)

10:15 AM - 10:30 AM

[ACG38-06] Wasserstein Distance as a Tool for Analyzing Large-Ensemble Datasets

*Yuki Yasuda1, Shoichiro Kido2 (1.Institute of Science Tokyo, 2.Japan Agency for Marine-Earth Science and Technology)

Keywords:Large-Ensemble Simulation, Variability, Information Theory, Wasserstein distance, Optimal Transport Theory

The atmosphere-ocean system exhibits two types of variability: forced variability in response to external forcing (e.g., changes in radiative forcing), and internal variability arising from chaotic behavior due to nonlinearity. The relative dominance of these components depends on the spatiotemporal scale of interest [1]. To quantitatively assess relative contributions from forced and intrinsic variability, large-ensemble simulations (LE-simulations) using atmosphere-ocean general circulation models have been extensively conducted for recent decades [2]. However, most studies using LE-simulation data still rely on conventional analysis methods that assume Gaussian distributions for variability, while the underlying distributions could be non-Gaussian [3]. In this regard, Sane et al. [4] applied information theory to propose a measure of internal variability strength, gSane. While this metric is applicable to non-Gaussian distributions, it is not clear whether it is also applicable to heavy-tailed distributions. Here, we present a new indicator gW based on the Wasserstein distance, which quantifies the distance between any frequency distributions. We demonstrate that this new indicator may be more suitable for analysis of LE-simulation data compared to gSane.

Consider all ensemble members at a given location, where X represents a physical quantity and Xave denotes its ensemble mean (see the top panels in Figure). In data analysis, X comprises time series of all ensemble members, and Xave is their ensemble mean time series. Following Sane et al. [4], gSane is defined by Eq. (1) in Figure, where I(X:Xave) is the mutual information representing the degree of nonlinear correlation between X and Xave, and H(X) is the Shannon entropy quantifying the uncertainty in X. The indicator gSane ranges from 0 to 1, with higher values indicating greater inter-ensemble variability (i.e., weaker correlation between X and Xave).

We propose a new indicator gW defined by Eq. (2) in Figure, where Xmed denotes the ensemble median of X. Here, X(q) denotes the q-th quantile of X (similarly for Xmed(q)), and the integral represents the Wasserstein distance between the frequency distributions of X and Xmed [5]. The normalization constant MAE represents the mean absolute error of X from its median. Like gSane, gW ranges from 0 to 1 and is applicable to non-Gaussian distributions, with larger values indicating greater inter-ensemble variability (i.e., the system is more chaotic rather than deterministic). However, unlike gSane, gW does not require additional parameters, such as bin widths, for its computation.

We evaluated both indicators by applying them to a simple toy-model, following Sane et al. [4], where the magnitude of internal variability (i.e., the degree of stochasticity) was prescribed. As internal variability decreased, the dispersion between random variables decreased, and both indicators showed correspondingly lower values. We then applied both indicators to near-surface temperature data from the Community Earth System Model Large Ensemble (CESM LENS) 20th-century historical experiment with 40 ensemble members [2].

The bottom panels in Figure show the spatial distributions of gSane and gW for near-surface temperature around Japan. Both indicators show lower and higher values over land and ocean, respectively, which likely reflects deterministic and chaotic nature of temperature over land and ocean, respectively. While gSane exhibits a patchy pattern that is somewhat difficult to interpret, gW shows smooth transitions between land and ocean regions. This suggests that gW may be more suitable for analyzing large-ensemble simulation results. In our presentation, we will also discuss pros and cons of our novel indicator and comparison with other existing metrics in detail.

[1] Hawkins and Sutton (2009), BAMS.
[2] Kay et al. (2015), BAMS.
[3] Franzke et al., (2020), Rev. Geophys.
[4] Sane et al. (2024), JGR Ocean.
[5] Peyré and Cuturi (2020), arXiv:1803.00567.