11:15 AM - 11:30 AM
[SCG55-03] MACHINE LEARNING MULTIVARIATE ANALYSIS FOR INJECTION-INDUCED SEISMICITY RISK EVALUATION
Keywords:Machine Learning, Table data analysis
The risk of induced seismicity associated with fluid injection in various subsurface development has become a concern for further implementation of various underground development. The risk of induced seismicity is often assessed prior to injection with physical and statistical models that consider the local seismic activity and engineering parameters, such as the amount of fluid to be injected. However, many geothermal projects have experienced earthquakes much larger than the maximum magnitude expected from the injection volume, and it is difficult to say that the induced seismic risk has been successfully evaluated. Therefore, we attempted to investigate parameters that have a hidden causal effect on the magnitude of induced seismicity using accumulated injection-induced earthquake case study table data. This study collected data from published literature on injection-induced seismicity, mainly from geothermal and gas, and oil development. The data were accumulated in the form of table data. Note that there are several missing data due to the variation of information in each field. First, the correlation between various parameters and magnitude was evaluated using Spearman's rank correlation coefficient to find hidden second and third-order critical parameters for the induced seismic magnitude. In this study, the calculated correlation coefficient values were tested at a significance level of 5% to confirm the validity of the correlation. The results show relatively strong positive correlations for parameters such as maximum fluid injection rate, total injected volume, vertical stress, and the size of the seismogenic zone (maximum, intermediate, and minimum principal axis). These results indicate that the injected fluid volume and the dimension of the seismogenic area are also correlated with magnitude, as has been proposed in previous physics-based models. On the other hand, parameters such as injectivity, reservoir pressure, and permeability were negatively correlated. This result has never been proposed before, and care should be taken in interpreting the dependencies to discuss causality. Second, a machine learning model was used to perform the tasks of regression problems to predict the maximum magnitude of injection-induced seismicity. To compensate for the missing data in our table data, we conducted a data completion process. The median, mean, K-nearest neighbor method and random forest model were used for data completion. In this study, LightGBM, a gradient-boosting model of decision tree systems, was used as the machine learning model. LightGBM is robust to outliers and can learn even from data, including missing values. Therefore, For each task in this study, we repeated the trials multiple times (n=30) in the following order: model training, parameter tuning, test inference, and visualization of prediction results. The results showed that the machine learning model for the data complemented by the random forest model was the most accurate for this task. The prediction results of the machine learning model were visualized and analyzed with the SHapley Additive exPlanations (SHAP) library, and the results showed that parameters such as the size of the seismogenic zone had a high contribution to the prediction in this regression task. This study, as well as other physics and statistics-based models, showed that there is a reasonable relationship between several parameters and induced seismicity magnitude. The analysis of the table data shows the possibility of finding hidden correlations and causality between some parameters and magnitude and is expected to be further utilized in the risk assessment of injection earthquake occurrence.