Keywords:Soil carbon, Dataset, Earth system model
Soil is the largest carbon stock in the terrestrial ecosystem. Therefore, understanding soil carbon dynamics is essential to predict future climate change. In the last two decades, several global soil datasets have been developed, and some are under further improvement. These datasets contain the global distributions of soil physiochemical properties, which allow us to calculate the global distribution of the soil organic carbon (SOC) stock, and some datasets provide the SOC stock by default. These datasets are based on globally observed data points, although there are biases in spatial distribution and densities of some data points. Earth system models (ESMs) have been created to understand the current climate and project future climate conditions. These models incorporate the terrestrial carbon cycle including SOC. However, it was reported that ESM results agree moderately at the biome level but that the correlation between the distribution of the SOC stock simulated by the ESMs and that of observational datasets is poor when the two were compared at a fine scale (e.g., 1° scale). In this study, we identified key factors governing global SOC distribution in observational datasets and those simulated by ESMs. We applied a data mining scheme and boosted regression trees to identify influential factors and how these factors are related to the SOC stock (Elith et al., 2008). We revealed similarities and differences between the observational and ESM datasets after comparing their outputs. The results of this study will be useful to understand the nature of observational SOC datasets and ESM outputs to improve the terrestrial carbon dynamics model in ESMs.