5:15 PM - 6:30 PM
[ACG36-P15] Comparison between random forest and multiple linear regression algorithms for digital soil mapping in the Thung Kula Ronghai region, Thailand
Keywords:Predictor variables, Remote sensing, Spatial distribution, Spectral and terrain indices, Soil property
Digital soil mapping (DSM) increases use of machine learning (ML) algorithms to identify appropriate relationships between soil properties and environmental variables, enabling to predict the soil nutrient levels. Over the past decades, many studies have been employed the multiple linear regression (MLR) algorithm to estimate the spatial distribution of soil chemical properties in various landscapes. The TKR region is an essential agricultural field to produce a good quality jasmine rice and the rice has been successfully registered as a Protected Geographical Indication (PGI) by the European Union. However, the rice yield in this region is lower than those in other regions in the country. In this study, we compare the random forest (RF), which is the most popular ML algorithm for digital soil mapping, with multiple linear regression algorithm to map the spatial distribution of soil chemical properties in the Thung Kula Ronghai (TKR) region, Thailand. These algorithms were compared on the basis of three factors: (1) accuracy of the models, (2) predictor variables selection, and (3) the spatial distribution characteristics of soil properties. The dataset consisted of 186 soil samples collected from surface layer 0-30 cm and analyzed for nutrients. Landsat-8 images collected at bare land conditions with 30 m resolution were used to calculate the spectral indices. A digital elevation model with 5 m resolution was used to derive the terrain variable of the study area. Soil properties were estimated using predictor variables by multiple linear regression as a simple model and random forests as a complex model. Ten-fold cross-validation was used to determine model accuracy. Developed models using RF and MLR were evaluated in terms of the coefficient of determination, root mean square error and normalized root mean square error. The results demonstrated that the RF and MLR models successfully produced digital soil maps of various soil properties. The spectral indices of brightness, saturation, coloration, normalized difference water and moisture stress were the important predictor variables and were significantly correlated with various soil properties. The random forest predictions showed higher accuracy than those of MLR for most of the soil properties. The RF model produced more realistic results in terms of the correlation between predicted and measured soil data, indicating that random forests were more appropriate to make digital maps of soil chemical properties in the TKR region.