5:15 PM - 6:30 PM
[HGM03-P02] Exploring a suitable sampling method for estimating the spatial variation in soil thickness in mountainous small catchment
Keywords:Soil thickness, Spatial model, Digital soil mapping, Optimized spatial sampling
There are still uncertainties in the spatial variability of soil thickness in mountainous and hilly areas in Japan. This is also the major uncertainty factors in hydrological models, and the regional assessment of the risk. Recently, mapping techniques (spatial prediction) for soil thickness, which is strongly correlated with environmental factors such as topography, have become powerful due to (1) the progress of machine learning methods that have drastically increased the accuracy of estimating soil properties, and (2) the development of big data related to satellite remote sensing measures to obtain high-definition DEM as environmental factors. By these technology, efficient spatial estimation of soil layer thickness can be expected.
On the other hand, a spatial estimation by machine learning differs significantly from traditional kriging based on geostatistical approach. Although grid sampling with appropriate intervals is ideal for kriging, a combination of explanatory and objective variables is important in machine learning (a method based on regression trees). To meet this problem, Conditioned Latin Hypercube Sampling (cLHS) or (Fuzzy) k-means optimization sampling have been proposed. Optimized sampling based on machine learning should be an appropriate approach in the mountainous and hilly areas in Japan, where the correlation between topographic factors and soil property values is strong. The purpose of this study was to validate the improvement of optimized sampling for soil layer thickness spatial estimation through comparison with the conventional random and grid point methods.
In addition to the three optimized sampling methods (k-means, Fuzzy-k-means, and cLHS), random and (pseudo)grid point methods were used for comparison. In addition to the three optimized sampling methods (k-means, Fuzzy-k-means, and cLHS), simulated sampling with different sampling numbers (N=50, 100....400) were conducted. Machine learning mapping (spatial estimation) was performed using the obtained sample set, and the accuracy of each method and number of times was evaluated by R2 and RMSE calculated from the validation dataset. One hundred iterations were performed for each of sampling type. Terrain attributes (e.g., slope, curvature, TWI) calculated from 2m-DEM resampled from AW3D-DTM (1m) were used as explanatory variables for optimal point selection and machine learning maps.
As a result, the accuracy of map estimation by the optimized sampling method was higher than that by the conventional random method and grid point method when the number of samples was small. Meanwhile, when the number of samples increased, the improvement in accuracy due to the increase in the number of samples was more prominent than the difference between the methods. The random sampling had a very large range of variation in accuracy due to probability, and the grid-sampling also had a possibility of changing accuracy depending on the initial point of the grid system. The predicted map showed the map by optimized sampling reproduced the rough spatial patterns even with a sampling number of about 100-200. Such spatial patterns are difficult to achieve by kriging, suggesting that machine learning mapping combined with optimized sampling is an effective spatial estimation method in the catchment. This indicated the possibility of overcoming the problems faced by conventional random and grid point sampling in the complex terrain of the mountainous regions of Japan.
On the other hand, a spatial estimation by machine learning differs significantly from traditional kriging based on geostatistical approach. Although grid sampling with appropriate intervals is ideal for kriging, a combination of explanatory and objective variables is important in machine learning (a method based on regression trees). To meet this problem, Conditioned Latin Hypercube Sampling (cLHS) or (Fuzzy) k-means optimization sampling have been proposed. Optimized sampling based on machine learning should be an appropriate approach in the mountainous and hilly areas in Japan, where the correlation between topographic factors and soil property values is strong. The purpose of this study was to validate the improvement of optimized sampling for soil layer thickness spatial estimation through comparison with the conventional random and grid point methods.
In addition to the three optimized sampling methods (k-means, Fuzzy-k-means, and cLHS), random and (pseudo)grid point methods were used for comparison. In addition to the three optimized sampling methods (k-means, Fuzzy-k-means, and cLHS), simulated sampling with different sampling numbers (N=50, 100....400) were conducted. Machine learning mapping (spatial estimation) was performed using the obtained sample set, and the accuracy of each method and number of times was evaluated by R2 and RMSE calculated from the validation dataset. One hundred iterations were performed for each of sampling type. Terrain attributes (e.g., slope, curvature, TWI) calculated from 2m-DEM resampled from AW3D-DTM (1m) were used as explanatory variables for optimal point selection and machine learning maps.
As a result, the accuracy of map estimation by the optimized sampling method was higher than that by the conventional random method and grid point method when the number of samples was small. Meanwhile, when the number of samples increased, the improvement in accuracy due to the increase in the number of samples was more prominent than the difference between the methods. The random sampling had a very large range of variation in accuracy due to probability, and the grid-sampling also had a possibility of changing accuracy depending on the initial point of the grid system. The predicted map showed the map by optimized sampling reproduced the rough spatial patterns even with a sampling number of about 100-200. Such spatial patterns are difficult to achieve by kriging, suggesting that machine learning mapping combined with optimized sampling is an effective spatial estimation method in the catchment. This indicated the possibility of overcoming the problems faced by conventional random and grid point sampling in the complex terrain of the mountainous regions of Japan.