Japan Geoscience Union Meeting 2023

Presentation information

[J] Oral

M (Multidisciplinary and Interdisciplinary) » M-GI General Geosciences, Information Geosciences & Simulations

[M-GI29] Data-driven geosciences

Sun. May 21, 2023 9:00 AM - 10:15 AM 301B (International Conference Hall, Makuhari Messe)

convener:Tatsu Kuwatani(Japan Agency for Marine-Earth Science and Technology), Hiromichi Nagao(Earthquake Research Institute, The University of Tokyo), Kenta Ueki(Japan Agency for Marine-Earth Science and Technology), Shin-ichi Ito(The University of Tokyo), Chairperson:Tatsu Kuwatani(Japan Agency for Marine-Earth Science and Technology), Kenta Ueki(Japan Agency for Marine-Earth Science and Technology), Hiromichi Nagao(Earthquake Research Institute, The University of Tokyo), Shin-ichi Ito(The University of Tokyo)

9:45 AM - 10:00 AM

[MGI29-04] Machine learning-based method for concentration simulation and source apportionment of Zn, Cu and Pb in Kosaka River, Northeast Japan

*Denghui Zhu1, Jiajie Wang1, Noriyoshi Tsuchiya1 (1.Graduate School of Environmental Studies, Tohoku University)


Keywords:heavy metal pollutants, machine learning technique, simulate, contamination sources, pollution level, river system

Heavy metals from both anthropogenic activities and natural source can be introduced into rivers by various pathways, posing a threat to human health and the natural environment. To determine the behaviors of heavy metal pollutants for remediation, traditional method requires long-term sampling and laboratory work for water and sediments, which is costly, less efficient, and hard to identify the contamination sources. Furthermore, traditional method will result in an overestimation of heavy metal pollution level in naturally occurring high heavy metal accumulation areas. Considering the promising advantages of using machine learning technique to process big data, evaluate water quality parameters and source apportionment, this research aims to develop a machine learning-based model in order to: 1. simulate the concentrations of heavy metals and identify the relative contributions of heavy metal sources of Kosaka River system in Hokuroku mining area; 2. assess the heavy metal loads in tributaries; 3. estimate the background concentrations of heavy metals to evaluate the pollution level.

We collected some river water and sediment samples along Kosaka River. The analytical results showed that there was a high possibility of Zn pollution in tributaries. In order to specifically trace the sources of heavy metals, the study area was divided into individual tributary polygons and mainstream polygons based on digital elevation model (DEM) data in QGIS. For an individual tributary polygon, the heavy metal loads in tributary within the polygon were considered to depend on and only depend on the features within this polygon, such as geological features, land use type, precipitation, and mine site information. For the mainstream, the heavy metal loads depended on both the internal features of mainstream polygon and the tributaries merging into the mainstream. Several machine learning algorithms such as random forest (RF) and support vector machine (SVM) were applied for the model establishment, since different algorithms could verify with each other. Certain internal features of the divided polygons were used as input variables to develop models by machine learning algorithms for predicting heavy metal concentrations in river water and sediment.

The preliminary modeling result proves that the current method is feasible and effective. However, some input variables need to be adjusted to make the model more accurate. At the current stage, sensitivity analysis is being conducted to modify and calculate the importance of the input variables. The importance of variables can be utilized to reflect the sources of heavy metals and to estimate the background concentrations of heavy metals. After readjusting the input variables, the coefficient of determination (R2) and the mean absolute error (MAE) will be used as statistical measurements to evaluate the performances of applied machine learning algorithms. The optimal machine learning algorithm will finally be selected.