Construction of a real-time seismic intensity prediction model using deep learning: evaluation of data augmentation with synthetic data

Ahyi KIM; Momoko NAKAMURA; Anju OHTA; Yukino YAZAKI; Hisahiko KUBO

4:30 PM - 4:45 PM

[SCG55-14] Construction of a real-time seismic intensity prediction model using deep learning: evaluation of data augmentation with synthetic data

*Ahyi KIM¹, Momoko NAKAMURA¹, Anju OHTA¹, Yukino YAZAKI¹, Hisahiko KUBO² (1.Yokohama City University, 2.National Research Institute for Earth Science and Disaster Resilience)

Keywords:Real time seismic intensity, Deep learning, LSTM, Data argumentation, Earthquake early warning

The rapid acquisition of detailed seismic intensity distribution at the time of and immediately after an earthquake plays an important role in assessing the damage, rescue activities, and recovery planning. The best way to increase the spatial resolution is to increase the number of seismic station, but this is difficult from a cost standpoint and is currently achieved by interpolating the data with various types of geological information. However, the spatial resolution depends greatly on the quantity and quality of geological information, and it is not always possible to determine the spatial resolution accurately at all locations. In this context, deep learning model which not require any geological information has been proposed for real-time seismic intensity prediction, and its usefulness has been demonstrated (Otake et al., 2020). In this study, we attempted to construct a real-time seismic intensity (Ir) prediction model using long-short term memory (LSTM), a type of deep learning, for the area around Yokohama City, which has much less training data than previous studies.
The model is trained with data from the prediction target station and four surrounding stations, and predicts the shape of Ir from the onset to the maximum seismic intensity of the target station which is not used in the training. For the target station and input stations, K-NET operated by the National Research Institute for Earth Science and Disaster Resilience are used; KNG002 for the target observation point, and KNG004, KNG012, TKY007, and TKY021 for the four input station. The LSTM dataset is selected from earthquakes that occurred between 1996 and 2020 and were observed at all five of the above stations (252 events). 10% of them were used as test data, 90% of the remaining data as training data, and 10% as validation data. In this study, in addition to a model that predicts the current Ir of the target station using time series data from a certain time to the present (0-second-ahead model), as in previous studies, we also examined the possibility of predicting the Ir of a target station in the future, considering its application to earthquake early warning systems. Specifically, since the distance between the input and target station is about 30 km, we constructed a model that predicts Ir 8 seconds ahead (8-second- ahead model), assuming an S-wave velocity of 4 km/s. MAE and RMSE were used as indices to evaluate the prediction results. The results were compared with the weighted average of the 8-second-ahead model, weighted by the distance of the input station, and the maximum value of the input stations, which imitated the PLUM method and it was found that the MAE and RMSE obtained by this study were lower than the weighted average and higher than the maximum. In addition, although the rough shape of Ir was reproduced, large outliers were also observed. Since one of the major causes of the prediction errors shown above is considered to be the lack of training data, we attempted to data argumentation by creating synthetic data to evaluate the issue. Here, the missing data from the above period were complemented by the synthetic data which is computed by the combination of the prediction equation for the shape of Ir from the onset to the maximum intensity by Kubo and Kunugi (2022) and the strong motion prediction equation by Tsukasa, Midorikawa (1999) and Midorikawa et al. (1999). This increased the number of Ir data used for training from 1026 to 1655(observed data : simulated data = 1241:414). The prediction results of this model showed that the error was smaller than the weighted average and the maximum value, indicating that the data argumentation improved the prediction accuracy. In addition, when all of the above training data was replaced with synthetic data, the prediction performance was slightly better than when the data was mixed with observed data. This result suggests that Ir prediction is possible even in areas with little observation data if a large amount of simulated data is available for training. In the future, we will generate synthetic data for a large number of scenario earthquake to verify the improvement of prediction performance, and we will also examine the possibility if Ir can be predicted using synthetic data at locations where no previous observed data is available.

Presentation information

[S-CG55] Driving Solid Earth Science through Machine Learning

[SCG55-14] Construction of a real-time seismic intensity prediction model using deep learning: evaluation of data augmentation with synthetic data