4:30 PM - 4:45 PM
[SCG53-05] Effects of Differences in Training Data on Tsunami Arrival Time Prediction Using Machine Learning
Keywords:Machine Learning, Neural Networks, Tsunami Simulation, Tsunami arrival time, Mean Square Error
A machine learning method can be used for instantaneous tsunami prediction with a certain margin of error. Still, few of them have focused on tsunami arrival time. Instead, many pre-processing methods have been studied to reduce the number of input variables using convolution and other methods to reduce the number of weight parameters to be optimized.
In general, machine learning prediction requires a large number of training data. Still, the relationship between the number of training data and the accuracy of tsunami arrival time prediction using machine learning needs to be clarified. Therefore, this research aims to investigate the effects of the number of training data on the accuracy of tsunami arrival time prediction by machine learning.
Methodology
In this research, the arrival time of a tsunami is predicted from the initial water level distribution soon after an earthquake using machine learning.
First, numerical tsunami simulations were conducted to extract the initial water level distribution and the tsunami arrival time, which were used as training data for machine learning. The conceptual diagram of this research is shown in Figure 1. Fault parameters were generated using the Random Phase Model, with moment magnitude (Mw) ranging from 8.6 to 9.0 with 0.2 Mw at equal intervals for three Mw. The tsunami simulation was performed using the tsunami and storm surge simulator Q-Wave, which uses the nonlinear long wave equation as the basic equation and adopts a geographic coordinate system (latitude, longitude, and altitude). The main conditions of the numerical calculations are shown in Table 1. Two regions were selected for this research to confirm the characteristics of the topography.
Next, neural networks (NNs) were used for learning and prediction. The initial water level distribution was used as input data, and the tsunami arrival time in the target area as output data. The conditions of the NN are shown in Table 2.
Evaluation
In this research, the mean squared error was used to evaluate accuracy. The number of total data varied from 100 to 800, the ratio of training data to test data was 7:3, and the mean squared error was evaluated for the entire region covered by the test data. Since the test data were randomly selected, the training was performed three times, and the average value was plotted. As a result, it was confirmed that the mean squared error decreased as the training data increased (Figure-2). However, we also confirmed that the error varies depending on the size of the earthquake and the region of interest. One of the possible reasons for this is that the larger the earthquake size, the more tsunami arrival locations, and thus, the larger the variance of the tsunami arrival time.
Next, Figures 2 and 3 show the average difference between the predictions and the correct answers for each Mw and region, where the training data is 700 and the test data is 100. The red areas indicate cases where the machine learning prediction results in a later tsunami arrival time. On the other hand, town A has a strong tendency toward overestimation. Finally, the histograms of the locations where the tsunami arrives at the correct value of Mw 9.0 are shown in Figure 4. It can be seen that the prediction tends to be slower (underestimation) and contains errors of 0 to 20 minutes in many cases and that the variation of difference values is smaller in Town A than in City B.
Conclusions
In this research, the tsunami arrival time was predicted by NN from the initial water level distribution using data obtained from tsunami simulations. The main findings of this research are as follows.
The number of training data decreases the error for all conditions (Mw and area of interest).
The tendency of overestimation and underestimation varies by region, with a maximum underestimation of 20 minutes (the predicted result arrives later).