11:15 AM - 11:30 AM
[SCG44-20] Development of a multi-GPU and multi-node numerical code for a large-scale simulation of slow-to-fast earthquakes
Keywords:slow earthquakes, numerical simulation, GPGPU
Slow earthquakes obey a scaling law in spatial and temporal sizes (Ide et al., 2007), as also found in fast earthquakes (i.e., regular earthquakes). In addition, understanding of the slow to fast transition is also a challenging theme remaining in these two multi-scale phonomena. However, several millions of elements seem to be required at least, to capture the multiscale phenomena in 2 or 3 orders in the spatial scale. This means that faster numerical calculation is necessary to make such multiscale simulations within realistic time.
I develop a numerical code to simulate slow earthquakes in a multi GPU and multi node environment, to overcome the problem in the calculation time. NVIDIA Fortran (previously known as PGI Fortran) is used to utilize GPUs in addition to CPUs, as the NIED computer system has four NVIDIA TESLA V100 on each node. One of the major bottlenecks of the high-speed computing is the evaluation of the product of elastostatic kernel matrix and a vector. To implement this calculation, I simply use DGEMV in the BLAS library for CPUs, and the CUBLAS library for GPUs. I note that a hybrid calculation with GPUs and CPUs is adopted to use totally wider memory band-width in the node, as the performance of DGEMV is often limited by the memory band-width. In this simulation, I adopt an RS-law with cutoff velocities. Temporal evolution of slip velocity is numerically simulated, introducing elastic response of semi-infinite medium and realistic configuration of the plate interface. The plate interface is expressed by small triangular elements.
To validate our developed code, I compared the results after the calculation of 20,000 steps in the medium-sized model (N~93,000). This is similar to the Shikoku model in Matsuzawa et al. (2013). Relative error of the result from the newly developed code and the CPU-only code is within 10-10. The numerical result is well replicated by the new code.
Then, I tested our numerical code in a relatively large-scale model (N~170,000). This covers the Nankai and Hyuganada region (e.g., Matsuzawa and Shibazaki, 2020). I use 16 GPU-CPU nodes in the NIED computer. The calculation is 1.5 times faster than the case of 256 CPU-only nodes. This means that my new code is about 24 times faster per node than the previous code only with CPUs.