日本地球惑星科学連合2024年大会

講演情報

[J] ポスター発表

セッション記号 M (領域外・複数領域) » M-GI 地球科学一般・情報地球科学

[M-GI28] データ駆動地球惑星科学

2024年5月27日(月) 17:15 〜 18:45 ポスター会場 (幕張メッセ国際展示場 6ホール)

コンビーナ:桑谷 立(国立研究開発法人 海洋研究開発機構)、長尾 大道(東京大学地震研究所)、上木 賢太(国立研究開発法人海洋研究開発機構)、伊藤 伸一(東京大学)

17:15 〜 18:45

[MGI28-P07] Development of a machine learning-based method for predicting the concentrations and identifying the sources of heavy metals in river water

*朱 登輝1、島田 智久1王 佳婕1土屋 範芳1 (1.東北大学大学院環境科学研究科)

キーワード:重金属汚染、機械学習、予測、発生源解析

Conventional assessment method for heavy metal pollution in river systems requires long-term sampling of river water with the subsequent laboratory analysis, which is time-consuming, laborious and costly. And also, previous method always uses principal component analysis (PCA) to identify the sources of heavy metals. However, some studies reveal that PCA will make variables become less interpretable and result in information loss. Considering the development of machine learning technique and its advantage for prediction, this research aims to develop an efficient method of utilizing the easily obtained source data of heavy metals (mine, industrial and domestic wastewater, geological background, soil features, land use type, vegetation, elevation, water discharge, precipitation, pH, temperature) by the latest interpretable machine learning technique in order to: 1. Predict the concentrations of heavy metals in river water; 2. Quantitatively identify the pollutant load contribution of each pollution source.

We have collected 160 river water samples from Yoneshiro River and Kosaka River which are located in Akita prefecture of Japan. The concentration of Pb, Zn, Cu and Cd in samples are measured. The above-mentioned source data of heavy metals are used as input variables and the measured Pb, Zn, Cu and Cd concentrations are used as output results to be trained by random forest (RF) to build the model for predicting Pb, Zn, Cu and Cd concentrations. SHapley Additive exPlanations (SHAP) is used to identify the source of heavy metals. The coefficient of determination (R2) and the mean square error (MSE) are used to evaluate the performances of the established machine learning model. The result showed that the concentration of Pb, Zn, Cu and Cd can be well predicted by the model with the R2 of 0.97, 0.93, 0.95, 0.99 and MSE of 10.25, 0.46, 0.011, 0.002 respectively. By the SHAP, the quantitatively source identification information of Zn, Cu, Pb and Cd for not only whole study area but also each sampling point is obtained. The result of SHAP is verified by comparison with the result of PCA.