Machine learning-based method for concentration simulation and source apportionment of Zn, Cu and Pb in Kosaka River, Northeast Japan

Denghui Zhu; Jiajie Wang; Noriyoshi Tsuchiya

9:45 AM - 10:00 AM

[MGI29-04] Machine learning-based method for concentration simulation and source apportionment of Zn, Cu and Pb in Kosaka River, Northeast Japan

*Denghui Zhu¹, Jiajie Wang¹, Noriyoshi Tsuchiya¹ (1.Graduate School of Environmental Studies, Tohoku University)

Keywords:heavy metal pollutants, machine learning technique, simulate, contamination sources, pollution level, river system

Heavy metals from both anthropogenic activities and natural source can be introduced into rivers by various pathways, posing a threat to human health and the natural environment. To determine the behaviors of heavy metal pollutants for remediation, traditional method requires long-term sampling and laboratory work for water and sediments, which is costly, less efficient, and hard to identify the contamination sources. Furthermore, traditional method will result in an overestimation of heavy metal pollution level in naturally occurring high heavy metal accumulation areas. Considering the promising advantages of using machine learning technique to process big data, evaluate water quality parameters and source apportionment, this research aims to develop a machine learning-based model in order to: 1. simulate the concentrations of heavy metals and identify the relative contributions of heavy metal sources of Kosaka River system in Hokuroku mining area; 2. assess the heavy metal loads in tributaries; 3. estimate the background concentrations of heavy metals to evaluate the pollution level.

We collected some river water and sediment samples along Kosaka River. The analytical results showed that there was a high possibility of Zn pollution in tributaries. In order to specifically trace the sources of heavy metals, the study area was divided into individual tributary polygons and mainstream polygons based on digital elevation model (DEM) data in QGIS. For an individual tributary polygon, the heavy metal loads in tributary within the polygon were considered to depend on and only depend on the features within this polygon, such as geological features, land use type, precipitation, and mine site information. For the mainstream, the heavy metal loads depended on both the internal features of mainstream polygon and the tributaries merging into the mainstream. Several machine learning algorithms such as random forest (RF) and support vector machine (SVM) were applied for the model establishment, since different algorithms could verify with each other. Certain internal features of the divided polygons were used as input variables to develop models by machine learning algorithms for predicting heavy metal concentrations in river water and sediment.

The preliminary modeling result proves that the current method is feasible and effective. However, some input variables need to be adjusted to make the model more accurate. At the current stage, sensitivity analysis is being conducted to modify and calculate the importance of the input variables. The importance of variables can be utilized to reflect the sources of heavy metals and to estimate the background concentrations of heavy metals. After readjusting the input variables, the coefficient of determination (R²) and the mean absolute error (MAE) will be used as statistical measurements to evaluate the performances of applied machine learning algorithms. The optimal machine learning algorithm will finally be selected.

Presentation information

[M-GI29] Data-driven geosciences

[MGI29-04] Machine learning-based method for concentration simulation and source apportionment of Zn, Cu and Pb in Kosaka River, Northeast Japan